20.
September 2002
This is the DATAPLOT News file DPNEWF.TEX. This NEWS file contains a
list of DATAPLOT enhancements over the last few years. This is
typically the only place that the most recent enhancements are
documented.
To get a hardcopy off-line listing of this file, exit DATAPLOT and
enter:
IBM PC: PRINT C:\DATAPLOT\DPNEWF.TEX
UNIX: lpr /usr/local/lib/dataplot/dpnewf.tex
VAX: PRINT DATAPLO$:DPNEWF.TEX (where DATAPLO$ defines the
directory where DATAPLOT auxillary files are kept)
other: Check with your local DATAPLOT installer;
at NIST: Alan Heckert (301-975-2899)
Jim Filliben (301-975-2855)
Your installation may define the directory where the DATAPLOT
auxillary files are stored differently than the list above.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
July 2007 - February 2008.
-----------------------------------------------------------------------
1) The following updates were made for probability
distributions.
a) Added the following new continuous distributions.
1) Burr Type 2
BU2CDF(X,R) - cdf function
BU2PDF(X,R) - pdf function
BU2PPF(P,R) - ppf function
2) Burr Type 3
BU3CDF(X,R,K) - cdf function
BU3PDF(X,R,K) - pdf function
BU3PPF(P,R,K) - ppf function
3) Burr Type 4
BU4CDF(X,R,C) - cdf function
BU4PPF(P,R,C) - ppf function
4) Burr Type 5
BU5CDF(X,R,K) - cdf function
BU5PDF(X,R,K) - pdf function
BU5PPF(P,R,K) - ppf function
5) Burr Type 6
BU6CDF(X,R,K) - cdf function
BU6PDF(X,R,K) - pdf function
BU6PPF(P,R,K) - ppf function
6) Burr Type 7
BU7CDF(X,R) - cdf function
BU7PDF(X,R) - pdf function
BU7PPF(P,R) - ppf function
7) Burr Type 8
BU8CDF(X,R) - cdf function
BU8PDF(X,R) - pdf function
BU8PPF(P,R) - ppf function
8) Burr Type 9
BU9CDF(X,R,K) - cdf function
BU9PDF(X,R,K) - pdf function
BU9PPF(P,R,K) - ppf function
9) Burr Type 10
B10CDF(X,R) - cdf function
B10PDF(X,R) - pdf function
B10PPF(P,R) - ppf function
10) Burr Type 11
B11CDF(X,R) - cdf function
B11PDF(X,R) - pdf function
B11PPF(P,R) - ppf function
11) Burr Type 12
B12CDF(X,C,K) - cdf function
B12PDF(X,C,K) - pdf function
B12PPF(P,C,K) - ppf function
12) DOUBLY PARETO UNIFORM
DPUCDF(X,M,N,ALPHA,BETA) - cdf function
DPUPDF(X,M,N,ALPHA,BETA) - pdf function
DPUPPF(P,M,N,ALPHA,BETA) - ppf function
13) KUMARASWAMY
KUMCDF(X,ALPHA,BETA) - cdf function
KUMPDF(X,ALPHA,BETA) - pdf function
KUMPPF(P,ALPHA,BETA) - ppf function
14) UNEVEN TWO-SIDED POWER
UTSCDF(X,A,B,D,NU1,NU3,ALPHA) - cdf function
UTSPDF(X,A,B,D,NU1,NU3,ALPHA) - pdf function
UTSPPF(P,A,B,D,NU1,NU3,ALPHA) - ppf function
15) SLOPE
SLOCDF(X,ALPHA) - cdf function
SLOPDF(X,ALPHA) - pdf function
SLOPPF(P,ALPHA) - ppf function
16) TWO-SIDED SLOPE
TSSCDF(X,ALPHA,THETA) - cdf function
TSSPDF(X,ALPHA,THETA) - pdf function
TSSPPF(P,ALPHA,THETA) - ppf function
17) OGIVE
OGICDF(X,N) - cdf function
OGIPDF(X,N) - pdf function
OGIPPF(P,N) - ppf function
18) TWO-SIDED OGIVE
TSOCDF(X,N,THETA) - cdf function
TSOPDF(X,N,THETA) - pdf function
TSOPPF(P,N,THETA) - ppf function
19) REFLECTED POWER FUNCTION
RPOCDF(X,C) - cdf function
RPOCHAZ(X,C) - cumulative hazard function
RPOHAZ(X,C) - hazard function
RPOPDF(X,C) - pdf function
RPOPPF(X,C) - ppf function
20) POWER FUNCTION
POWCHAZ(X,C) - cumulative hazard function
POWHAZ(X,C) - hazard function
The cdf, pdf, and ppf functions were already
available.
21) WAKEBY
WAKPDF(X,BETA,GAMMA,DELTA) - pdf function
The cdf and ppf functions were added in a
previous release.
22) Muth
MUTCDF(X,BETA) - cdf function
MUTPDF(X,BETA) - pdf function
MUTPPF(P,BETA) - ppf function
23) Logistic-Exponential
LEXCDF(X,BETA) - cdf function
LEXCHAZ(X,BETA) - cumulative hazard function
LEXHAZ(X,BETA) - hazard function
LEXPDF(X,BETA) - pdf function
LEXPPF(P,BETA) - ppf function
b) The definitions for the exponential power, alpha, and
Maxwell distributions were modified from
PEXCDF(X,ALPHA,BETA,LOC,SCALE)
PEXHAZ(X,ALPHA,BETA,LOC,SCALE)
PEXCHAZ(X,ALPHA,BETA,LOC,SCALE)
PEXPDF(X,ALPHA,BETA,LOC,SCALE)
PEXPPF(P,ALPHA,BETA,LOC,SCALE)
ALPCDF(X,ALPHA,BETA,LOC,SCALE)
ALPHAZ(X,ALPHA,BETA,LOC,SCALE)
ALPCHAZ(X,ALPHA,BETA,LOC,SCALE)
ALPPDF(X,ALPHA,BETA,LOC,SCALE)
ALPPPF(P,ALPHA,BETA,LOC,SCALE)
MAXCDF(X,SIGMA,LOC,SCALE)
MAXPDF(X,SIGMA,LOC,SCALE)
MAXPPF(P,SIGMA,LOC,SCALE)
to
PEXCDF(X,BETA,LOC,SCALE)
PEXHAZ(X,BETA,LOC,SCALE)
PEXCHAZ(X,BETA,LOC,SCALE)
PEXPDF(X,BETA,LOC,SCALE)
PEXPPF(P,BETA,LOC,SCALE)
ALPCDF(X,ALPHA,LOC,SCALE)
ALPHAZ(X,ALPHA,LOC,SCALE)
ALPCHAZ(X,ALPHA,LOC,SCALE)
ALPPDF(X,ALPHA,LOC,SCALE)
ALPPPF(P,ALPHA,LOC,SCALE)
MAXCDF(X,LOC,SCALE)
MAXPDF(X,LOC,SCALE)
MAXPPF(X,LOC,SCALE)
This reflects the fact that the ALPHA parameter for the
exponential power distribution, the BETA parameter for the
alpha distribution, and the SIGMA parameter for the Maxwell
distribution are in fact scale parameters. The random numbers,
probability plots, ppcc/ks plots, and Kolmogorov
Smirnov and chi-square gooodness of fit tests were
updated to reflect this change as well.
c) Added support for maximum likelihood estimation for
the following distributions:
Reflected generalized Topp and Leone
Burr type 10
Wakeby (actually generates L-Moments estimates)
exponential power
2) Added the following statistics:
LET A = LP LOCATION X
LET A = LP VARIANCE X
LET A = LP SD X
These statistics are supported by the following commands:
PLOT
TABULATE
CROSS TABULATE
CROSS TABULATE PLOT
LET Y = CROSS TABULATE
LET Y = MATRIX M
BOOTSTRAP PLOT
JACKNIFE PLOT
INFLUENCE CURVE
BLOCK PLOT
DEX PLOT
3) Added the following for graphics output devices.
a) Added the following device drivers
AQUA - Aquaterm for Mac OSX systems
Enter HELP AQUA for details.
b) Added the following command
SET POSTSCRIPT CONVERT CONVERT
This is a enhancement to the previously available
command SET POSTSCRIPT CONVERT. The SET POSTSCRIPT CONVERT
command uses the Ghostscript command to automatically
covert Dataplot Postscript output to one of the listed
image formats. One limitation was that the Ghostscript
command did not provide a command line switch to
generate a landscape orientation plot (which most
Dataplot graphs need). The "CONVERT CONVERT" option
uses the "convert" program in Image Magic instead of
Ghostscript. This option does support landscape
mode.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
March 2007 - July 2007.
-----------------------------------------------------------------------
1) We have made the following updates for categorical data
analysis.
There are two basic types of data that the following
commands address.
a) We have two variables,each with n observations, where
the first can have one of r mutually exclusive values
and the second can have one of c mutually exclusive values.
So each observation will fit into exactly one of the
r levels of variable one and exactly one of the c levels
of variable two.
Your data can be either in raw form (two columns of data
each with n rows) or summary form (an rxc table which
will typically be read into Dataplot as a matrix).
Each entry in the summary table is a count of how many
times that particular combination occurred.
b) If each variable can have exactly two outcomes (typically
coded as 1/0), then we have the 2x2 special case. There
are a number of specialized methods for dealing with
this type of data.
For this type of data, the number of observations for
the two variables need not be equal.
Some examples of this type of data are:
i) We have a diagnostic test to detect a disease.
Variable one specifies whether the patient in
fact has the disease (coded as 1) or not (coded
as 0). Variable two specifies whether the test
detected the disease (coded as 1) or not (coded
as 0).
ii) We are testing instruments to determine whether or
not they can detect a particular substance. Variable
one is the ground truth (coded as 1 when the substance
is present and coded as 0 when it is not). Variable
two denotes whether the instrument detected the
substance (1 for detected, 0 for not detected).
The following capabilities have been added to Dataplot
for analyzing these type of data.
a) The following statistical tests were added:
ODDS RATIO INDEPENDENCE TEST N11 N21 N12 N22
ODDS RATIO INDEPENDENCE TEST Y1 Y2
ODDS RATIO INDEPENDENCE TEST M
CHI-SQUARE INDEPENDENCE TEST N11 N21 N12 N22
CHI-SQUARE INDEPENDENCE TEST Y1 Y2
CHI-SQUARE INDEPENDENCE TEST M
FISHER EXACT TEST N11 N21 N12 N22
FISHER EXACT TEST Y1 Y2
FISHER EXACT TEST M
MCNEMAR TEST N11 N21 N12 N22
MCNEMAR TEST Y1 Y2
MCNEMAR TEST M
ODDS RATIO CHI-SQUARE TEST Y1 Y2
ODDS RATIO CHI-SQUARE TEST Y1 Y2 X
ODDS RATIO CHI-SQUARE TEST Y1 X1 Y2 X2
MANTEL-HAENSZEL TEST Y1 Y2
MANTEL-HAENSZEL TEST Y1 Y2 X
MANTEL-HAENSZEL TEST Y1 X1 Y2 X2
b) Added the following statistics:
LET A = ODDS RATIO X1 X2
LET A = ODDS RATIO STANDARD ERROR X1 X2
LET A = LOG ODDS RATIO X1 X2
LET A = LOG ODDS RATIO STANDARD ERROR X1 X2
LET A = RELATIVE RISK X1 X2
LET A = CRAMER CONTINGENCY COEFFICIENT X1 X2
LET A = MATRIX GRAND CRAMER CONTINGENCY COEFFICIENT M
LET A = PEARSON CONTINGENCY COEFFICIENT X1 X2
LET A = MATRIX GRAND PEARSON CONTINGENCY COEFFICIENT M
LET A = FALSE POSITIVE Y1 Y2
LET A = FALSE NEGATIVE Y1 Y2
LET A = TRUE POSITIVE Y1 Y2
LET A = TRUE NEGATIVE Y1 Y2
LET A = TEST SENSITIVITY Y1 Y2
LET A = TEST SPECIFICITY Y1 Y2
LET A = POSITIVE PREDICTIVE VALUE Y1 Y2
LET A = NEGATIVE PREDICTIVE VALUE Y1 Y2
These statistics are supported by the following commands:
PLOT
TABULATE
CROSS TABULATE
CROSS TABULATE PLOT
BOOTSTRAP PLOT
JACKNIFE PLOT
c) Added the following graphics:
ROC CURVE Y1 Y2 X - generate a ROC curve
ROSE PLOT Y - generate a rose plot (also
ROSE PLOT Y1 Y2 known as a four-fold plot)
BINARY TABULATION PLOT Y1 Y2 X1 X2
BINARY PLOT Y1 Y2 X1
where is one of:
CORRECT MATCH
FALSE POSITIVE
FALSE NEGATIVE
TRUE POSITIVE
TRUE NEGATIVE
These "binary" plots are used to generate summary
plots of "1/0" type data across groups.
ASSOCIATION PLOT M - generate an association plot
ASSOCIATION PLOT Y1 Y2
ASSOCIATION PLOT N11 N21 N12 N22
SIEVE PLOT M - generate a sieve plot
SIEVE PLOT Y1 Y2
SIEVE PLOT N11 N21 N12 N22
2) We have made the following updates for probability
distributions.
a) Maximum likelihood estimates were added for the
following distributions:
Katz (generates moment estimates)
slash
triangular
four parameter beta (generates moment estimates)
log beta
beta normal
The maximum likelihood for the two-sided power distribution
was generalized to include the lower and upper limit
parameters.
The slash and triangular distributions have also been
added to the BOOTSTRAP/JACKNIFE MLE PLOT command:
BOOTSTRAP TRIANGULAR MLE PLOT Y
JACKNIFE TRIANGULAR MLE PLOT Y
BOOTSTRAP SLASH MLE PLOT Y
JACKNIFE SLASH MLE PLOT Y
The maximum likelihood estimation for the
two-sided power distribution was updated from the
the standard case (lower and upper limits = 0 and 1)
to the general case (lower and upper limits will be
estimated from the data). Also, the ML procedure for
this distribution only applies if the N shape parameter
is > 1.
b) Added the following commands for binomial confidence
intervals:
LET A = EXACT BINOMIAL LOWER BOUND P N ALPHA
LET A = EXACT BINOMIAL UPPER BOUND P N ALPHA
LET ALOW AUPP = AGRESTI COULL LIMITS P N ALPHA
The BINOMIAL MAXIMUM LIKELIHOOD command can generate
these values for raw data. The above LET commands are
useful when you only have summary data (i.e., the p and n).
c) Added the following plots:
POISSON PLOT Y X
GEOMETRIC PLOT Y X
BINOMIAL PLOT Y X
NEGATIVE BINOMIAL PLOT Y X
LOGARITHMIC SERIES PLOT Y X
These plots are alternatives to the PROBABILITY PLOT
command.
ORD PLOT Y
This plot can help distinguish whether a Poisson,
a negative binomial, or a logarithmic series
distribution provides a more appropiate distributional
model for a set of discrete data.
3) Made the following updates to graphics commands.
a) The HISTOGRAM command now accepts a matrix argument.
b) Added the command
BIVARIATE NORMAL TOLERANCE REGION PLOT Y1 Y2 X
4) Added the following statistics:
LET P1 =
LET P2 =
LET A = TRIMMED STANDARD DEVIATION Y
5) Added the following command
SET FATAL ERROR
If an analysis or graphics command returns an error code,
this command tells Dataplot how to respond:
IGNORE - Dataplot will simply continue processing the
next command. This was the behavior before
this command was added and is the default.
TERMINATE - Dataplot will print a message and terminate
immediately.
PROMPT - Dataplot will prompt whether you want to
continue or terminate.
This command was added primarily as a debugging option.
If you are trying to debug a complex macro, it can be helpful
to have Dataplot terminate (or prompt for termination)
in order to locate where the initial error is occurring.
Note that this command is not active if you are running
the Graphical User Interface (GUI) version.
6) A Windows Vista installation is now available.
7) Fixed a number of miscellaneous bugs.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
May 2006 - February 2007.
-----------------------------------------------------------------------
1) The following updates were made for maximum likelihood estimates
for distributions:
a) The negative binomial was updated to distinguish between
two cases: 1) the case where k is assumed known (p is
estimated) and 2) the case where k is assumed unknown.
For case 1), confidence limits for p were added.
b) Maximum likelihood estimates were added for the
following discrete distributions:
zeta
Borel-Tanner
Lagrange-Poisson
lost games
beta-geometric
Polya-Aeppli
generalized logarithmic series
geeta
Consul
quasi binomial type I
generalized lost games
generalized negative binomial
topp and leone
c) The binomial mle was updated in the following ways:
1) For exact intervals, fixed a bug for extreme values
of p and small samples.
2) By default, Dataplot switches from the exact method
to the normal approximation for sample sizes greater
than 30 (Agresti-Coull intervals are always generated).
You can specify the threshold with the command
SET BINOMIAL NORMAL APPROXIMATION THRESHOLD
3) Some analysts prefer to use a continuity correction
(p + 0.5)/(n + 1)
You can specify whether to use the continuity
correction by entering the command
SET BINOMIAL CONTINUITY CORRECTION
The default is OFF.
2) The following distributional updates were made.
a) The YULCDF was updated to use an explicit formula (as
oppossed to direct summation).
b) For the KS PLOT, the location and scale parameters are
estimated via the probability plot. For long-tailed
distributions, more accurate estimates may be obtained
by applying a biweight fit of the probability plot.
To specify this option, enter the command
SET PPCC PLOT LOCATION SCALE BIWEIGHT
To restore the use of the regular least squares
estimates of location and scale, enter
SET PPCC PLOT LOCATION SCALE DEFAULT
c) Added the following new continuous distributions.
1) Asymmetric Log-Laplace
ALDCDF(X,ALPHA,BETA) - cdf function
ALDPDF(X,ALPHA,BETA) - pdf function
ALDPPF(P,ALPHA,BETA) - ppf function
2) Log-Beta
LBECDF(X,ALPHA,BETA,C,D) - cdf function
LBEPDF(X,ALPHA,BETA,C,D) - pdf function
LBEPPF(P,ALPHA,BETA,C,D) - ppf function
3) Topp and Leone
TOPCDF(X,BETA) - cdf function
TOPPDF(X,BETA) - pdf function
TOPPPF(P,BETA) - ppf function
4) Generalized Topp and Leone
GTLCDF(X,ALPHA,BETA) - cdf function
GTLPDF(X,ALPHA,BETA) - pdf function
GTLPPF(P,ALPHA,BETA) - ppf function
5) Reflected Generalized Topp and Leone
RGTCDF(X,ALPHA,BETA) - cdf function
RGTPDF(X,ALPHA,BETA) - pdf function
RGTPPF(P,ALPHA,BETA) - ppf function
6) Wakeby:
WAKCDF(X,BETA,GAMMA,DELTA) - cdf function
WAKPPF(P,BETA,GAMMA,DELTA) - ppf function
d) Added the following new discrete distributions.
1) Beta-Geometric (Waring)
BGECDF(X,ALPHA,BETA) - cdf function
BGEPDF(X,ALPHA,BETA) - pdf function
BGEPPF(X,ALPHA,BETA) - ppf function
2) Beta-Negative Binomial (generalized Waring)
BNBCDF(X,ALPHA,BETA,k) - cdf function
BNBPDF(X,ALPHA,BETA,k) - pdf function
BNBPPF(X,ALPHA,BETA,k) - ppf function
3) Zeta
ZETCDF(X,ALPHA) - cdf function
ZETPDF(X,ALPHA) - pdf function
ZETPPF(X,ALPHA) - ppf function
4) Zipf
ZIPCDF(X,ALPHA,N) - cdf function
ZIPPDF(X,ALPHA,N) - pdf function
ZIPPPF(X,ALPHA,N) - ppf function
5) Borel-Tanner
BTACDF(X,LAMBDA,N) - cdf function
BTAPDF(X,LAMBDA,N) - pdf function
BTAPPF(X,LAMBDA,N) - ppf function
6) Lagrange-Poisson
LPOCDF(X,LAMBDA,THETA) - cdf function
LPOPDF(X,LAMBDA,THETA) - pdf function
LPOPPF(X,LAMBDA,THETA) - ppf function
7) Leads in Coin Tossing (Discrete Arcsine)
LCTCDF(X,N) - cdf function
LCTPDF(X,N) - pdf function
LCTPPF(X,N) - ppf function
8) Classical Matching
MATCDF(X,K) - cdf function
MATPDF(X,K) - pdf function
MATPPF(X,K) - ppf function
9) Polya-Aeppli
PAPCDF(X,THETA,P) - cdf function
PAPPDF(X,THETA,P) - pdf function
PAPPPF(X,THETA,P) - ppf function
10) Generalized Logarithmic Series
GLSCDF(X,THETA,BETA) - cdf function
GLSPDF(X,THETA,BETA) - pdf function
GLSPPF(X,THETA,BETA) - ppf function
11) Geeta
GETCDF(X,THETA,BETA) - cdf function
GETPDF(X,THETA,BETA) - pdf function
GETPPF(X,THETA,BETA) - ppf function
This distribution can also be parameterized with
MU and BETA.
12) Quasi Binomial Type 1
QBICDF(X,P,PHI) - cdf function
QBIPDF(X,P,PHI) - pdf function
QBIPPF(X,P,PHI) - ppf function
13) Generalized Negative Binomial
GNBCDF(X,THETA,BETA,M) - cdf function
GNBPDF(X,THETA,BETA,M) - pdf function
GNBPPF(X,THETA,BETA,M) - ppf function
14) Truncated Generalized Negative Binomial
GNTCDF(X,THETA,BETA,M,N) - cdf function
GNTPDF(X,THETA,BETA,M,N) - pdf function
GNTPPF(X,THETA,BETA,M,N) - ppf function
15) Discrete Weibull
DIWCDF(X,Q,BETA) - cdf function
DIWPDF(X,Q,BETA) - pdf function
DIWPPF(X,Q,BETA) - ppf function
DIWHAZ(X,Q,BETA) - hazard function
16) Consul (a generalized geometric)
CONCDF(X,THETA,M) - cdf function
CONPDF(X,THETA,M) - pdf function
CONPPF(X,THETA,M) - ppf function
17) Lost Games
LOSCDF(X,P,R) - cdf function
LOSPDF(X,P,R) - pdf function
LOSPPF(X,P,R) - ppf function
18) Generalized Lost Games
GLGCDF(X,P,J,A) - cdf function
GLGPDF(X,P,J,A) - pdf function
GLGPPF(X,P,J,A) - ppf function
19) Katz
KATCDF(X,ALPHA,BETA) - cdf function
KATPDF(X,ALPHA,BETA) - pdf function
KATPPF(X,ALPHA,BETA) - ppf function
e) The Waring routines (WARCDF, WARPDF, WARPPF) routines
were re-written to take advantage of their relationship
to the beta-geometric (the Waring is simply a different
parameterization of the beta-geometric). This makes
the Waring routines more computationally efficient and
more accurate.
3) Added the following LET sub-commands.
a) Added the harmonic number and generalized harmonic
number functions:
LET A = HARMNUMB(N)
LET A = HARMNUMB(N,M)
b) For certain types of plots, it can be useful to add a
small bit of random noise to a variable to avoid
overplotting. This is commonly referred to as jittering.
To simplify this, the following command was added:
LET DELTA
LET Y = JITTER X DELTA
The value of DELTA is used to control the magnitude of
the jittering. That is, the value of x(i) will be
changed to a value x(i) + noise where noise is in the
range (-DELTA/2,DELTA/2).
4) Made the following updates to the CONSENSUS MEANS command.
a) If a within-lab standard deviation is zero (i.e., the lab
has only a single unique measurement value), that lab
will be omitted from the analysis (it will be included
in the initial summary table). Previously, Dataplot
treated this as an error and would not run the
consensus means analysis.
b) Added the Fairweather method. There are 3 separate
methods for generating 95% confidence intervals for this
method (the original method proposed by Fairweather,
an improvement suggested by Cox, and a method developed
by Ruhkin). The output for this method is only printed
if the minimum number of oberservations for a lab is
greater than 5.
c) Added the Bayesian Consensus Procedure (BCP) method of
Hagwood and Guthrie. This is a refinement of the BOB
method. For this method, the consensus mean and the
standard deviation of the consensus mean are asymptotically
equivalent to the posterior mean and standard deviation of
a fully Bayesian method.
d) Dataplot currently supports 12 methods. Most users will
only be interested in a subset of these methods. You
can now selectively turn individual methods on or off
(all methods are on by default) with the commands:
SET MANDEL PAULE
SET MODIFIED MANDEL PAULE
SET VANGEL RUHKIN
SET BOB
SET SCHILLER EBERHARDT
SET MEAN OF MEANS
SET GRAND MEAN
SET GRAYBILL DEAL
SET GENERALIZED CONFIDENCE INTERVAL
SET DERSIMONIAN LAIRD
SET FAIRWEATHER
SET BAYESIAN CONSENSUS PROCEDURE
5) The following updates and enhancements were made to
the graphics commands.
a) Added the command:
SET 4-PLOT DISTRIBUTION
The 4-plot by default consists of a run sequence plot,
a lag plot, a histogram, and a normal probability plot.
The above command allows us to replace the normal
probability plot with an exponential probability plot.
This is useful when checking the assumptions for a
Homogeneous Poisson Process (HPP) where we assume the
interarrival times follow an exponential distribution.
b) Added the command:
REPAIR PLOT Y X CENSOR
This is used to plot repair data where we may have
multiple systems and each system may have a single
censoring time (i.e., the time between the last repair
and the end of the test). Enter HELP REPAIR PLOT
for details.
c) Added the command:
MEAN REPAIR FUNCTION PLOT Y X CENSOR
d) Added the command
TRILINEAR PLOT Y1 Y2 Y3
This is used for plots where the rows of Y1, Y2, and
Y3 are mixtures (i.e., they sum to either 1 (or 100
if you are using fractional units)).
6) Updated the RELIABILITY TREND TEST in the following
ways.
a) Fixed a bug in the reverse arrangements test.
b) Modified the output format for better clarity.
c) Added support for multiple systems. For multiple systems,
the tests will be applied to each individual system and
then composite tests will be performed.
d) Added support for HTML, Latex, and RTF format.
7) The following bug fixes were made:
a) The 2 variable case for the chi-square goodness of fit
test for discrete distributions had a bug. This has
been fixed. For older versions, a work around is
SET MINSIZE = 1
LET Y3 XLOW XHIGH = COMBINE FREQUENCY TABLE Y2 X2
POISSON CHI-SQUARE GOODNESS OF FIT Y3 XLOW XHIGH
b) Some bugs with LET subcommands and SUBSETTING were
corrected.
c) A bug involving IF statements within nested loops was
corrected.
d) A few other miscellanous bug fixes were made.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT
September 2005 - April 2006.
-----------------------------------------------------------------------
1) For many one-factor plots, it is useful to sort the horizontal
axis based on the value of some statistic (most commonly a
location statistic such as the mean, median, minimum, or
maximum). The following commands was added to help generate
these sorted plots:
LET XSORT INDX = SORT BY X GROUPID
For example, to generate a sorted mean plot for variables
Y and X, you would do something like
LET X2 INDX = SORT BY MEAN Y X
X1TIC MARK LABEL FORMAT VARIABLE
X1TIC MARK LABEL CONTENT INDX
MEAN PLOT Y X2
This can be used with the following types of plots
i) PLOT Y X
where is a desired statistic (e.g., MEAN or
SD).
ii) BOX PLOT Y X
iii) PLOT Y X GROUP
For details, enter HELP SORT BY STATISTIC.
These plots often have alphabetic tick mark labels. The
following enhancements were made to simplify the use
of alphabetic tick mark labels with sorted plots.
a) The TIC MARK LABEL FORMAT and TIC MARK LABEL CONTENT
commands were previously augmented to allow numeric
variables, group label variables, or the row label
variable as the contents for the tick mark labels.
Specifically,
LET LAB = DATA 50 40 30 20 10 0
X1TIC MARK LABEL FORMAT VARIABLE
X1TIC MARK LABEL CONTENT LAB
LET IG = GROUP LABELS A B C D E
X1TIC MARK LABEL FORMAT GROUP LABEL
X1TIC MARK LABEL CONTENT IG
X1TIC MARK LABEL FROMAT ROW LABELS
This has been enhanced to allow an index variable to
be specified on the above TIC MARK LABEL CONTENT
commands (the index variable is typically generated by
a SORT BY command). The index variable specifies
the order in which the tic mark labels will be generated.
So the above examples can be augmented by
LET X2 INDX = SORT BY MEAN Y X
LET LAB = DATA 50 40 30 20 10 0
X1TIC MARK LABEL FORMAT VARIABLE
X1TIC MARK LABEL CONTENT LAB INDX
LET X2 INDX = SORT BY MEAN Y X
LET IG = GROUP LABELS A B C D E
X1TIC MARK LABEL FORMAT GROUP LABEL
X1TIC MARK LABEL CONTENT IG INDX
LET X2 INDX = SORT BY MEAN Y X
X1TIC MARK LABEL FROMAT ROW LABELS
X1TIC MARK LABEL CONTENT INDX
b) The LET ... = GROUP LABEL .... command was augmented in
the following two ways.
i) You can specify literal strings for group labels.
For example,
LET IG = GROUP LABEL BATCHSP()1 BATCHSP()2 ...
BATCHSP()3 BATCHSP()4
The strings are separated by spaces. If you need to
include a space in a particular string, use the
SP() as in the above example.
ii) Pre-defined strings can be used to define a group
label variable. For example,
LET IG = GROUP LABEL ST1 TO ST10
where ST1, ST2, ...., ST10 are previously defined
strings. The TO syntax is useful in this context
when the number of strings is large.
Dataplot's algorithm for parsing the GROUP LABEL command
is:
i) Dataplot first checks the character variables file
(HELP SET CONVERT CHARACTER for details). If the
first name listed is found, Dataplot uses this
character variable to define the group labels.
ii) If a character variable is not found, Dataplot
checks all the listed names to see if they are
previously defined strings. If they are, then
Dataplot substitutes the values of these strings.
iii) If one or more of the names is not a previously
defined string, then Dataplot treats all of the
names as literal text strings.
2) You can now pass arguments to macros.
To pass arguments to a macro, do something like
CALL SAMPLE.DP arg1 arg2 arg3
Up tp 10 arguments may be passed (although limits on command
line lengths still apply). Arguments containing spaces or
hyphens should be enclosed in quotes. The character limit for
a single argument is 40 characters.
In the SAMPLE.DP macro, if a $1 is encountered, it will be
replaced with "arg1", if a $2 is encountered, it will be
replaced with "arg2" and so on. A $0 will substitute the
number of arguments given on the CALL command.
This substitution will only occur if a command line is contained
within a macro (i.e., if no macro is active, the "$" will not
signal any substitution and it will remain in the command line
as given).
Dataplot currently only supports one level of argument
substitition for macros. That is, the values of the macro
arguments (i.e., the $1, $2, etc.) will contain the values
given by the most recent CALL command that specified at least
one argument. If you need to nest CALL commands with macro
arguments, the recommended work around is to have the
higher level macro extract any macro arguments passed to it
into temporary variables or strings before calling any other
macros. For example, supposse SAMPLE.DP needs to call
SAMPLE2.DP with arguments. You could do something like
the following in SAMPLE.DP:
. Start of SAMPLE.DP macro
let string zzzzs1 = $1
let string zzzzs2 = $2
let string zzzzs3 = $3
...
call sample2.dp newarg1 newarg2
The default character for argument substitution is the
"$". To use a different character, enter the command
MACRO SUBSTITUTION CHARACTER
3) The following enhancements were made to the CAPTURE
command (the CAPTURE command re-directs alphanumeric output
to a file rather than displaying it on the screen).
a) Sometimes it may be useful to have the output sent to
both the screen and to a file. You can do this by
entering the command
CAPTURE SCREEN ON
To restore CAPTURE output only being sent to the
CAPTURE file, enter the command
CAPTURE SCREEN OFF
b) Sometimes it may be useful to selectively send output to
the CAPTURE file. You can do this with the following
commands:
CAPTURE SUSPEND
CAPTURE RESUME
where SUSPEND specifies that output will be sent to the
screen rather than the CAPTURE file (note that the CAPTURE
file remains open) and RESUME will send the output to
the currently open CAPTURE file. You can enter as many
CAPTURE SUSPEND/CAPTURE RESUME sequences as you like
between a CAPTURE/END OF CAPTURE session.
Note that OFF is a synonym for SUSPEND and ON is a
synonym for RESUME.
4) Made the following probability distribution updates:
a) Added confidence intervals for the maximum likelihood
estimates for the geometric distribution.
b) Added confidence intervals for the maximum likelihood
estimates for the Poisson distribution.
c) Added support for the following new probability
distributions:
1) Added the type 2 generalized logistic distribution.
Enter HELP GL2PDF for details.
2) Added the type 3 generalized logistic distribution.
Enter HELP GL3PDF for details.
3) Added the type 4 generalized logistic distribution.
Enter HELP GL4PDF for details.
4) Added the Hosking parameterization of the generalized
logistic distribution. Enter HELP GL5PDF for details.
5) Added the generalzied Tukey-Lambda distribution. Enter
HELP GLDPDF for details.
6) Added the beta-normal distribution. Enter HELP BNOPDF
for details.
7) Added the asymmetric log double exponential (Laplace)
distribution. Enter HELP ALDPDF for details.
5) Added or modified the following analysis comamnds.
a) The Durbin test for identifical effects in a two-way
table for balanced incomplete block designs is supported
with the command
DURBIN TEST Y BLOCK TREATMENT
Enter
HELP DURBIN TEST
for details.
b) The TOLERANCE LIMITS command generates both normal tolerance
limits and non-parametric tolerance limits. You can now
specify only one of these with the commands
NORMAL TOLERANCE LIMITS
NONPARAMETRIC TOLERANCE LIMITS
c) The GRUBS TEST for outlier detection was previously augmented
to generate three distinct tests:
i) a test for both the minimum and maximum points as
outliers.
ii) a test for the minimum points as an outliers.
iii) a test for the maximum points as an outliers.
This has now been modifed into three distinct commands:
GRUBBS TEST Y
GRUBBS MINIMUM TEST Y
GRUBBS MAXIMUM TEST Y
This was done so that the internally saved parameters
(e.g., STATVAL, STATCDF, etc.) will now be correct for
the appropriate test.
d) The CONSENSUS MEANS command was modified in a number of
ways. Specifically,
1) The output format was modified to make it more
consistent and to provide better clarity. In
particular, a clearer distinction is made between
standard uncertainty (the standard error of the
consensus mean), expanded uncertainty (2*standard
error) and expanded uncertainty based on a
normal or t percent point value.
2) Modified the summary tables. There are now 4 summary
tables generated:
i) A summary table of the original data.
ii) A summary table of the 95% confidence limits
generated by each method
iii) A summary table of the standard uncertainties
generated by each method (i.e., the standard
error of the consensus mean estimate)
iv) A summary table of the expanded uncertainties
generated by each method (i.e., the 2 times
the standard error of the consensus mean estimate)
3) Added the following new methods:
i) The Graybill-Deal method now generates confidence
limits using a method proposed by Andrew Rukhin.
It also generates 4 distinct estimates of the
variance of the consensus mean (the Sinha method,
the naive method, and 2 methods proposed by
Nien-Fan Zhang. The commonly used naive method
is know to seriously underestimate the variance
for small sample sizes.
ii) Added the generalized confidence interval method
proposed by Hari Iyer and Jack Wang.
iii) Added the DerSimonian-Laird method.
4) Previous versions of Dataplot allowed you to create
the CONSENSUS MEANS output in HTML format
(CAPTURE HTML FILE.HTM) or Latex format
(CAPTURE LATEX file.tex). This was extended to
include Rich Text Format (RTF). The RTF option
is used for creating output that can be read into
Microsoft Word (RTF is a protocol Microsoft created
for transporting word processing files between
different word processing programs). For example
CAPTURE RTF FILE.RTF
CONSENSUS MEAN Y X
END OF CAPTURE
You can then import FILE.RTF into Word. Note that
although RTF is suppossed to be a portable format,
our experience is that non-Word word processors do a
poor job of importing the Dataplot RTF files (tables
tend to be problamatic for non-Word software and
Dataplot is creating most of its RTF output as tables).
6) The following updates were made to graphics output devices.
a) The GD library, used to generate JPEG and PNG format
graphs, was updated from version 1.84 to 2.033. The
primary consequence of this is that we can now generate
GIF format files as well. To generate GIF files, enter
SET IPL1NA PLOT.GIF
DEVICE 2 GD GIF
b) Dataplot can now generate graphs in Latex format.
The primary motivation for using this format is
to generate publication quaility graphs. There are
some unique features to this device driver that are
described in detail in the HELP LATEX command.
7) The following statistic command was added.
LET A = RATIO Y1 Y2
This statistic is the sum of Y1 divided by the sum of Y2.
The following additional commands are supported:
TABULATE RATIO Y1 Y2 X
CROSS TABULATE RATIO Y1 Y2 X1 X2
RATIO PLOT Y1 Y2 X
RATIO CROSS TABULATE PLOT Y1 Y2 X1 X2
BOOTSTRAP RATIO PLOT Y1 Y2
JACKNIFE RATIO PLOT Y1 Y2
8) The following special function library functions were added:
I0INT - integral of the modified Bessel function of the
first kind and order 0
J0INT - integral of the Bessel function of the first kind
and order 0
K0INT - integral of the modified Bessel function of the
third kind and order 0
Y0INT - integral of the Bessel function of the second kind
and order 0
I0ML0 - difference of the modified Bessel function of the
first kind of order 0 and the modified Struve function
of order 0
I1ML1 - difference of the modified Bessel function of the first
kind of order 1 and the modified Struve function of
order 1
AIRINT - integral of the Airy function Ai
BIRINT - integral of the Airy function Bi
AIRYGI - modified Airy function Gi
AIRYHI - modified Airy function Hi
ATNINT - integral of the inverse-tangent function
9) Added the following LET subcommands:
a) LET Y2 = REPLACE GROUPID GROUP2 Y1
This command does the following:
1) It matches the values in GROUP2 against GROUPID and
returns the indices of the matching rows for the GROUPID
array.
2) The indices are used to access the corresponding value
in the Y1 array.
3) The corresponding row of Y2 is replaced with the Y1
value.
The abbreviated syntax
LET Y2 = REPLACE GROUPID GROUP
simply assigns a value of 1 in the corresponding row of Y2.
Enter HELP REPLACE for details.
b) LET Y2 X2 = MATRIX BIN M
This command is used to generate a frequency table for
the elements in a matrix. This can be used to generate
a histogram of the elements in a matrix. For example,
LET Y2 X2 = MATRIX BIN M
HISTOGRAM Y2 X2
Enter HELP MATRIX BIN for details.
c) LET M = MATRIX TRUNCATION M IVALUE
LET M = MATRIX LOWER TRUNCATION M IVALUE
Set all values in the matrix M that are less than
IVALUE to IVALUE. This command can be used in conjunction
with the MATRIX SUBTRACT command to remove background
values from a matrix. For example, if the background
value is 5, do something like
LET IBACK = 5
LET IZERO = 0
LET M = MATRIX SUBTRACT M IBACK
LET M = MATRIX TRUNCATION M IZERO
Likewise, you can use the following command to perform
an upper truncation:
LET M = MATRIX LOWER TRUNCATION M IVALUE
That is, any values in M greater than IVALUE are set to
IVALUE.
10) The SET HISTOGRAM CLASS WIDTH was previously implemented to
specify different default class width algorithms for
histograms. This command was extended to apply to the
following additional commands:
LET Y2 X2 = BINNED Y
LET Y2 X2 = MATRIX BIN Y
NORMAL MIXTURE MAXIMUM LIKELIHOOD Y
CHI-SQUARE GOODNESS OF FIT Y
2 SAMPLE CHI-SQUARE GOODNESS OF FIT Y
11) Added the following command
PROCESS ID
This command will print the process id and save this
process id in the internal parameter PID.
12) Made the following bug fixes.
a) Previously, if all elements of a response variable were
equal, the HISTOGRAM command would print an error message
and not generate the histogram. Dataplot will now
print a warning message, but will generate a histogram
with one non-zero class (it will generate one class above
and one class below with zero count as well).
b) In the TABULATE command, if all elements in the response
variable are identifical, change from an error message to a
warning message and perform the tabulation anyway.
c) Corrected a bug in Friedman's test. The previous version
is correct if the original data is the rank within a block.
The corrected version does not require that the data
already be ranked.
d) The WILK SHAPIRO command was not returning the p-value in
the saved parameter PVALUE correctly. This was corrected.
e) For the command
LET Z2 = BIVARIATE INTERPOLATION Z Y X Y2 X2
the Y and X arguments were in the wrong order (i.e., the
command was interperting Y X as X Y). This was corrected.
f) Fixed bugs in the
LET X = CHARACTER CODE IX1
LET X = ALPHABETIC CHARACTER CODE IX1
commands.
g) The command
LET Y2 XLOW XUPP = COMBINE FREQUENCY TABLE Y X
is used to combine low frequency bins. The original
implementation simply worked from left to right to
combine the bins. Since low frequency bins typically
occur in the left and right tails, the algorithm was
modified to move from the left tail to the center and
then from the right tail to the center.
h) Fixed a bug where the ORIENTATION command could cause
Dataplot to hang on subsequent plots if no DEVICE 2
command was defined and a software font was used to
draw text.
i) Dataplot creates and uses a number of temporary files
in the current directory.
If you have multiple sessions running from the current
directory, this can create a problem for these temporary
files. In most cases, a conflict does not occur because
Dataplot will open the file, read or write to the file,
and then close the file immediately. However, a few
files, such as the plot files dppl1f.dat and dppl2f.dat,
typically remain open. The effect of different Dataplot
sessions trying to access these files is system dependent.
1. On Unix and Windows 98/NT4 platforms, the file will
contain whatever was most recently written to it.
2. On Windows 2000/XP platforms, the Dataplot session
that opens the file first has a "lock" on the file.
This causes any subsequent Dataplot session that tries
to access the file to hang.
This is particularly a problem with the GUI version
on Windows 2000/XP. Specifically, if the Dataplot GUI
does not shut down cleanly, the underlying Dataplot
executable does not get killed. This then causes any
future attempt to open the GUI to hang since the "dead"
Dataplot executable has a lock on the file. You have to
use "Cntrl-Alt-Del" to bring up the Task Manager, select
"Processes", and then manually kill any "DPLAHEY.EXE"
processes in order to clear the dead process.
In particualar, if you close the GUI by clicking the
"x" in the upper right hand corner (rather than clicking
the EXIT menu), this does not kill the underlying
DPLAHEY.EXE process.
As a partial solution to this problem, Dataplot should
now trap this condition. It will print a message
indicating how to clear the "dead" DPLAHEY.EXE process.
In addition, it will do one of two things in the current
Dataplot process:
a. It will attach the process id to the temporary file
name and then re-open the file.
b. It will simply ignore file (so if dppl2f.dat is locked,
Dataplot will not write the current plot to dppl2f.dat
in the current Dataplot session).
You can specify which option Dataplot will use by entering
one of the following commands in your startup file
(c:\Program Files\NIST\DATAPLOT\DPLOGF.TEX):
SET TEMPORARY FILE PID
SET TEMPORARY FILE IGNORE
The default is PID.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT June - August 2005.
-----------------------------------------------------------------------
1) The following matrix commands were added.
a. The sum of all elements in a matrix can be computed with
the following command
LET A = MATRIX SUM M
b. Previous versions of Dataplot allowed you to compute
various column or row statistics
(HELP MATRIX COLUMN STATISTIC or HELP MATRIX ROW STATISTIC
for details). This capability has been extended to the
case of computing the statistics for the entire matrix
with the command
LET A = MATRIX GRAND M
where denotes the desired the statistic (the list
of supported statistics is the same as for the
MATRIX COLUMN STATISTIC and MATRIX ROW STATISTIC commands.
c. Previous versions of Dataplot allowed you to compute
various column or row statistics
(HELP MATRIX COLUMN STATISTIC or HELP MATRIX ROW STATISTIC
for details). This capability has been extended to the
case where the matrix is divided into equal partitions
with the command
LET MOUT = MATRIX PARTITION M NROW NCOL
with M, NROW, and NCOL denoting the input matrix, the number
of rows in each sub-matrix, and the number of columns in
each sub-matrix, respectively. Note that this command
returns a matrix (MOUT) of values.
That is, the original matrix is divided into sub-matrices
containing NROW rows and NCOL columns each. The partition
starts at row 1 and column 1. The number of rows in MOUT
is determined by dividing the number of rows in M by NROW.
Likewise, the number of columns is determined by dividing
the number of columns in M by NCOL. If this division
does not result in an integer value (e.g., 23 columns
in M and NCOL = 5 results in 3 columns left over), then the
last column, or row, of MOUT will be based on whatever
columns are left over.
In addition, the MATRIX PARTITION command has been extended
to accomodate unequal partitions where the partitions need
not be contiguous.
The syntax in this case is
LET MOUT = MATRIX PARTITION M TAGROW TAGCOL
with M denoting the input matrix. In this case, TAGROW and
TAGCOL are vectors with TAGROW having the same number of rows
as M and TAGCOL having the same number of columns as M.
The elements of TAGROW and TAGCOL identify which partition
each element of M belongs to. The output matrix will be
dimensioned based on the number of distinct values in
TAGROW and TAGCOL.
2) The following commands were added to compute probability
weighted moments and L-moments.
LET P = PROBABILITY WEIGHTED MOMENTS Y
LET L = L MOMENTS Y
3) The following distributional updates were made.
a. Made the following enhancements to the generalized Pareto
maximum likelihood command.
1. L-moment and elemental percentile estimates are now
included. The L-moment estimators are a refinement of
probability weighted moments. The elemental perecentile
method is described in Castillo, Hadi, Balakrishnan, and
Sarabia, "Extreme Value and Related Models with
Applications in Engineering and Science", Wiley, 2005.
One advantage of the elemental percentile approach is that
it does not have the restricted domain for the shape
parameter that the moment and maximum likelihood estimators
have.
2. The elemental percentile estimate is now used as the
starting value for the maximum likelihood. This seems
to improve the convergence of the ML method.
3. The methods used (moments, L-moments, elemental percentiles,
and maximum likelihood) do not estimate a location
parameter.
By default, these methods will now use the minimum data
value (minus an epsilon fudge factor) as the estimate of
location. The data will subtract this value before
applying the estimation procedures.
If you would like to provide your own location estimate,
enter the command
LET THRESHOL =
Any data values less than the value specified for
THRESHOL will be omitted from the estimation. Note that
the generalized Pareto is often used in the context of
modeling the distribution of "points above a threshold",
so specifying a threshold greater than some of the data
points is fairly common.
4. The maximum likelihood estimates now include the normal
approximation confidence intervals for the scale and
shape parameters and, optionally, for select percentiles
of the data.
To specify percentile estimates, enter the command
SET MAXIMUM LIKELIHOOD PERCENTILES
where specifies the name of a variable containing
the desired percentiles. You can specify DEFAULT to
to use a default set of values.
Be aware that for the generalized Pareto maximum
likelihood estimation, a relatively large sample size
may be required for the asymptotic normal approximations
to become reasonably accurate. Some studies have
indicated sample sizes of at least 500 may be required.
b. Added support for the maximum likelihood estimation for
the inverted Weibull distribution:
INVERTED WEIBULL MLE Y
INVERTED WEIBULL MLE Y X
The first syntax supports the full sample case. It will
return confidence intervals for the shape and scale
parameters for various values of alpha (based on the
normal approximations) and will return confidence intervals
for selected percentiles if you have entered a
SET MAXIMUM LIKELIHOOD PERCENTILES DEFAULT command.
The second syntax supports the censored case. This case
currently only returns point estimates.
c. The BINOMIAL MLE now returns improved confidence intervals.
d. We have modified the output from a number of the maximum
likelihood commands to make the output more consistent.
3) Made a number of bug fixes. In particular
a. Fixed a bug where the following orm of the DERIVAIVE command
wasn't being recognized:
LET FUNCTION D = DERIVATIVE F WRT X
This syntax should now work.
b. Fixed the DIFFERENCE OF MEANS CONFIDENCE INTERVALS command
(in adding support for the HTML/LATEX output, we had shut
off the standard ASCII output). Fixed the HTML outout
for this command.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January - May 2005.
-----------------------------------------------------------------------
1) Distributional Modeling Updates
a. Dataplot provides extensive distributional modeling
capabilities via probability plots and PPCC/KS plots. One
limitation of these methods is that they do not provide
estimates for the uncertainty of the parameter estimates
and for the distribution quantiles.
The BOOTSTRAP ... PLOT command was enhanced to support
distributional modeling for a number of distributions.
This can be used to obtain confidence intervals for the
distribution parameters, for selected percentiles of the
distribution, and for the value of the PPCC (or K-S
statistic).
For details, enter
HELP DISTRIBUTIONAL BOOTSTRAP
b. For the case of one shape parameter, the PPCC plot was
enhanced to support a group option (where group means
multiple batches of data as oppossed to binned data).
In this case, a separate curve is drawn for each batch
of the data. This can be used to check for a common
shape parameter across multiple batches of data. For
details, enter
HELP PPCC PLOT
c. The PPCC PLOT and PROBABILITY PLOT commands support binned
data. Previously, the binning consisted of two variables:
the first contained the bin frequencies and the second
contaned the mid-point of the bins. This form assumes
the bins are of equal width.
Some binned data may contain bins of unequal width. The
most common reason for the this is to combine bins in the
tails which have low frequencies.
The PPCC PLOT and PROBABILITY PLOT commands were updated
to handle this case. In this case, the syntax is
PPCC PLOT Y XLOW XHIGH
PROBABILITY PLOT Y XLOW XHIGH
with Y, XLOW, and XHIGH denoting the frequency variable,
the lower class boundary, and the upper class boundary,
respectively. For details, enter
HELP PPCC PLOT
HELP PROBABILITY PLOT
d. The following enhancenets were made to the maximum
likelihood estimation.
1. Added confidence intervals for the location and scale
parameters for the double exponential case
(DOUBLE EXPONENTIAL MAXIMUM LIKELIHOOD Y).
2. Added a weighted order statistics method to the Cauchy
maximum likelihood estimation (CAUCHY MLE Y). This method
was added because it is the method recommended for the
Cauchy Anderson-Darling test (see D'Agostino and Stephens,
"Goodness-Of-Fit Techniques", Marcel Dekker, 1986, p. 164).
3. Added support for the maximum case of the 2-parameter
extreme value type 2 (Frechet) distribution. This includes
confidence intervals for the estimated parameters and
for select percentiles (see
SET MAXIMUM LIKELIHOOD PERCENTILES).
e. The Anderson-Darling test now supports the extreme value
type 2 (Frechet) for the maximum case and the Cauchy
distribution.
f. Added support for the minimum case for the generalized
extreme value distribution. Added the GEVHAZ and GEVCHAZ
functions to compute the hazard and cumulative hazard
functions for the generalized extreme value distribution.
g. A number of distributions (Weibull, Gumbel, Frechet,
and generalized extreme value) support both a minimum and
a maximum case. The command
SET MINMAX <1/2>
is used to specify which case (1 = minimum, 2 = maximum).
If no MINMAX command is entered, previous versions used
the value 1 as the default (this was chosen since the
minimum case is what is typically used for the Weibull
distribution).
However, for the other distributions, the maximum case
is generally the one most used. For this reason, we
added the value 0 to indicate the default where the default
is now specific to each distribution. For the Weibull, the
default is the minimum and for the Gumbel, Frechet, and
generalized extreme value the default is the maximum.
2) Interlaborartory Analysis Updates
Dataplot added the following commands to perform an
interlaboratory analysis as documented in
"Standard Practice for Conducting an Interlaboratory Study
to Determine the Precision of a Test Method", ASTM
International, 100 Barr Harbor Drive, PO BOX C700,
West Conshohoceken, PA 19428-2959, USA. This document is
in support of ASTM Standard E 691 - 99.
The specific commands added are:
LET A = REPEATABILITY STANDARD DEVIATION Y LABID
LET A = REPRODUCABILITY STANDARD DEVIATION Y LABID
LET H = H CONSISTENCY STATISTIC Y LABID
LET K = K CONSISTENCY STATISTIC Y LABID
LET H TAG = H CONSISTENCY STATISTIC Y LABID MATID
LET K TAG = K CONSISTENCY STATISTIC Y LABID MATID
E691 INTERLAB Y LABID MATID
The E691 INTERLAB command generates four tables documentented
in the above document. The other comamnds are useful in
generating the plots described in this standard.
In addition, a number of built-in macros were added to
generate the various graphs demonstrated in the standard.
For more information, enter
HELP E691 INTERLAB
3) The following command can be useful in converting data in a
two-way table to a format required by certain Dataplot
commands
LET Y MATID LABID = REPLICATED STACK X1 ... XK LAB
The resulting output has the form
X1(1) 1 LAB(1)
. . .
X1(n) 1 LAB(n)
X2(1) 2 LAB(1)
. . .
X2(n) 2 LAB(n)
...
Xk(1) k LAB(1)
. . .
Xk(n) k LAB(n)
This is a variation of the STACK command. The distinction is
that the last variable entered is interpreted as a labid
variable that is replicated for each of the response variables.
For details, enter
HELP REPLICATED STACK
4) Extreme Value Analysis
a. Enhancements were made to the CME and DEHAAN commands (these
estimate the parameters for a generalized Pareto distribution).
b. Added the following command
PEAKS OVER THRESHOLD PLOT Y
For details, enter PEAKS OVER THRESHOLD PLOT Y.
5) Platform Specific Issues
a) We have separated the Windows installation files into two
distinct cases:
a) Windows 2000/XP platforms
b) Windows 95/98/NT4/ME platforms
This was required for compiler compatibility reasons. The
Lahey LF90 and Compaq Visual Fortran compilers were starting
to show some problems under Windows XP (specifically with
Service Pack 2).
For Windows 2000/XP, we have upgraded to the Intel 8.1
Fortran compiler. However, this compiler does not support
Windows 98 and earlier platforms. So the
Windows 95/98/NT4/ME version is still built using the
Lahey (for the GUI) and Compaq compilers.
b) We have updated the Mac OSX installation. There is now a
single file that you download that includes the executable,
the auxillary files, the source, the needed Tcl/Tk files,
and the g77 compiler. This simplifies the installation
(e.g., you do not have to install Tcl/Tk yourself).
6) We have started overhauling some of the menus for the graphical
interface (GUI). This will not be radically different, just an
effort to provide better organization and clarity to the menus.
This updating will occur over several releases. The initial
update has re-arranged the top level menus. We have added
a "Getting Started" menu to help new users. The Reliability
and Extreme Values menus have been reorganized.
7) Dataplot uses the "." for the decimal point when reading data.
Some countries use the "," for this purpose.
We have added the command
SET DECIMAL POINT
with denoting the character to be used as the decimal
point.
Note that the use of this is currently fairly limited. It is
used in free-format reads only. It is provided to allow
international users the ability to read their data files
without editing them. Note that it does not apply if you
use the SET READ FORMAT command to define a format for the
data. It is also not used for writing data nor for the
output from Dataplot commands.
8) Fixed a number of bugs.
a. Fixed the COLUMN LIMITS where the specified limits are
arrays (as oppossed to single scalar values) to work in
the case where columns are of unequal length.
b. Internally, Dataplot treats strings and functions
interchangeably. The one distinction is that strings
preserve case. However, when strings are operating as
functions, we want them to be converted to upper case.
Dataplot was updated so that when a string is used as a
function, it is converted to upper case. This also
required some updates in the "^" and "&" string operators
to handle case conversions appropriately.
c. Fixed a bug in the Wilcox signed rank test when it was
used for a 1-sample test.
d. For generalized Pareto percent point function, the scale
parameter was ignored. This was corrected.
e. Fixed a bug in the HFLPPF library function.
f. The GRUBBS TEST checks for both the maximum and minimum
values as outliers (relative to the normal distribution).
This is actually two tests: one for the minimum value and
one for the maximum value. When testing for both, the
value of alpha needs to be divided by 2.
The fix was to have the Grubbs test generate output for
3 tests:
1) Test both the minimum and the maximum value (with the
value of alpha adjusted appropriately).
2) Test the minimum value only.
3) Test the maximum value only.
To suppress the one-sided tests, enter the command
SET GRUBBS ONE SIDED OFF
g. Fixed a bug in the discrete uniform random number generator.
The algorithm was generating random numbers on the interval
[1,N]. This was corrected to generate random numbers on the
interval [0,N].
h. If the PRINTING switch was set to OFF, the YATES command
was not writing information to files "dpst1f.dat" and
"dpst2f.dat". This was corrected so that these files are
printed regardless of the setting of the PRINTING switch.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT June - December 2004.
-----------------------------------------------------------------------
1) The following updates were made for probability distributions.
A. The following enhancements were made to maximum likelihood
estimation.
1. The maximum likelihood output was rewritten for the
normal, lognormal, exponential, Weibull, gamma, beta,
Gumbel, and Pareto distributions.
Support was added for the following:
a. Improved confidence intervals for the distributional
parameters.
b. support for censored data was added for the normal,
lognormal, exponential, Weibull, and gamma distributions.
c. Confidence intervals for selected percentiles was added
for the normal, lognormal, exponential, Weibull, gamma,
beta, and Gumbel distributions.
2. Added support for the Rayleigh, Maxwell, asymmetric
Laplace, generalized Pareto, and normal mixture
distributions:
RAYLEIGH MAXIMUM LIKELIHOOD Y
MAXWELL MAXIMUM LIKELIHOOD Y
ASYMMETRIC LAPLACE MAXIMUM LIKELIHOOD Y
GENERALIZED PARETO MAXIMUM LIKELIHOOD Y
LET NCOMP =
NORMAL MIXTURE MAXIMUM LIKELIHOOD Y
The NCOMP parameter is used to specify how many normal
distributions to mix (it defaults to 2 if a value is not
specified for NCOMP).
The online help for the maximum likelihood was also rewritten.
Enter
HELP MAXIMUM LIKELIHOOD
for details.
B. Support was added for the following new distributions.
Skew-Laplace (Skew Double Exponential) distribution:
LET A = SDECDF(X,LAMBDA) - cdf of skew-Laplace distribution
LET A = SDEPDF(X,LAMBDA) - pdf of skew-Laplace distribution
LET A = SDEPPF(X,LAMBDA) - ppf of skew-Laplace distribution
Asymmetric Laplace (Asymmetric Double Exponential) distribution:
LET A = ADECDF(X,LAMBDA) - cdf of asymmetric Laplace
distribution
LET A = ADEPDF(X,LAMBDA) - pdf of aysmmetric Laplace
distribution
LET A = ADEPPF(X,LAMBDA) - ppf of asymmetric Laplace
distribution
Maxwell-Boltzman distribution:
LET A = MAXCDF(X,SIGMA) - cdf of Maxwell Boltzman
LET A = MAXPDF(X,SIGMA) - pdf of Maxwell Boltzman
LET A = MAXPPF(X,SIGMA) - ppf of Maxwell Boltzman
Rayleigh distribution:
LET A = RAYCDF(X) - cdf of Maxwell Boltzman
LET A = RAYPDF(X) - pdf of Maxwell Boltzman
LET A = RAYPPF(X) - ppf of Maxwell Boltzman
Generalized Inverse Gaussian distribution:
LET A = GIGCDF(X,CHI,LAMBDA,THETA) - cdf of generalized inverse
gaussian distribution
LET A = GIGPDF(X,CHI,LAMBDA,THETA) - pdf of generalized inverse
gaussian distribution
LET A = GIGPPF(X,CHI,LAMBDA,THETA) - ppf of generalized inverse
gaussian distribution
Generalized Asymmetric Laplace distribution:
LET A = GALCDF(X,KAPPA,TAU) - cdf of generalized asymmetric
Laplace distribution
LET A = GALPDF(X,KAPPA,TAU) - pdf of generalized asymmetric
Laplace distribution
LET A = GALPPF(X,KAPPA,TAU) - ppf of generalized asymmetric
Laplace distribution
Bessel I Function distribution:
LET A = BEICDF(X,S1SQ,S2SQ,NU) - cdf of Bessel I function
distribution
LET A = BEIPDF(X,S1SQ,S2SQ,NU) - pdf of Bessel I function
distribution
LET A = BEIPPF(X,S1SQ,S2SQ,NU) - ppf of Bessel I function
distribution
McLeish (related to Bessel K function) distribution:
LET A = MCLCDF(X,ALPHA) - cdf of McLeish distribution
LET A = MCLPDF(X,ALPHA) - pdf of McLeish distribution
LET A = MCLPPF(X,ALPHA) - ppf of McLeish distribution
Generalized McLeish (related to Bessel K function) distribution:
LET A = GMCCDF(X,ALPHA,A) - cdf of McLeish distribution
LET A = GMCPDF(X,ALPHA,A) - pdf of McLeish distribution
LET A = GMCPPF(X,ALPHA,A) - ppf of McLeish distribution
C. The following random number generators, plots, and commands
were added:
LET LAMBDA =
LET Y = SKEW LAPLACE RANDOM NUMBERS FOR I = 1 1 N
SKEW LAPLACE PROBABILITY PLOT Y
SKEW LAPLACE KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
SKEW LAPLACE CHI-SQUARE GOODNESS OF FIT Y
SKEW LAPLACE PPCC PLOT Y
SKEW LAPLACE KS PLOT Y
LET LAMBDA =
LET Y = ASYMMETRIC LAPLACE RANDOM NUMBERS FOR I = 1 1 N
ASYMMETRIC LAPLACE PROBABILITY PLOT Y
ASYMMETRIC LAPLACE KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
ASYMMETRIC LAPLACE CHI-SQUARE GOODNESS OF FIT Y
ASYMMETRIC LAPLACE PPCC PLOT Y
ASYMMETRIC LAPLACE KS PLOT Y
LET Y = MAXWELL RANDOM NUMBERS FOR I = 1 1 N
MAXWELL PROBABILITY PLOT Y
MAXWELL KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
MAXWELL CHI-SQUARE GOODNESS OF FIT Y
LET Y = RAYLEIGH RANDOM NUMBERS FOR I = 1 1 N
RAYLEIGH PROBABILITY PLOT Y
RAYLEIGH KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
RAYLEIGH CHI-SQUARE GOODNESS OF FIT Y
LET CHI =
LET LAMBDA =
LET THETA =
LET Y = GENERALIZED INVERSE GAUSSIAN RANDOM NUMBERS ...
FOR I = 1 1 N
GENERALIZED INVERSE GAUSSIAN PROBABILITY PLOT Y
GENERALIZED INVERSE GAUSSIAN KOLMOGOROV SMIRNOV ...
GOODNESS OF FIT Y
GENERALIZED INVERSE GAUSSIAN CHI-SQUARE ...
GOODNESS OF FIT Y
LET KAPPA =
LET TAU =
LET Y = GENERALIZED ASYMMETRIC LAPLACE RANDOM NUMBERS ...
FOR I = 1 1 N
GENERALIZED ASYMMETRIC LAPLACE PROBABILITY PLOT Y
GENERALIZED ASYMMETRIC LAPLACE KOLMOGOROV SMIRNOV ...
GOODNESS OF FIT Y
GENERALIZED ASYMMETRIC LAPLACE CHI-SQUARE ...
GOODNESS OF FIT Y
LET S1SQ =
LET S2SQ =
LET NU =
LET Y = BESSEL I FUNCTION RANDOM NUMBERS FOR I = 1 1 N
BESSEL I FUNCTION PROBABILITY PLOT Y
BESSEL I FUNCTION KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
BESSEL I FUNCTION CHI-SQUARE GOODNESS OF FIT Y
LET ALPHA =
LET Y = MCLEISH RANDOM NUMBERS FOR I = 1 1 N
MCLEISH PROBABILITY PLOT Y
MCLEISH KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
MCLEISH CHI-SQUARE GOODNESS OF FIT Y
MCLEISH PPCC PLOT Y
MCLEISH KS PLOT Y
LET ALPHA =
LET A =
LET Y = GENERALIZED MCLEISH RANDOM NUMBERS FOR I = 1 1 N
GENERALIZED MCLEISH PROBABILITY PLOT Y
GENERALIZED MCLEISH KOLMOGOROV SMIRNOV GOODNESS OF FIT Y
GENERALIZED MCLEISH CHI-SQUARE GOODNESS OF FIT Y
GENERALIZED MCLEISH PPCC PLOT Y
GENERALIZED MCLEISH KS PLOT Y
D. Dataplot uses the following defintion for the generalized
Pareto probability density function:
f(x,gamma) = (1+gamma*x)**(-(1/gamma)-1)
However, many sources (e.g., Johnson, Kotz, and Balakrishnan)
define the generalized Pareto as:
f(x,gamma) = (1-gamma*x)**((1/gamma)-1)
That is, the sign of gamma is reversed. The following
command was added:
SET GENERALIZED PARETO DEFINITION
was added. A value of JOHNSON or KOTZ for this command
will use the second definition given. Any other value
will use the first (default) definition.
E. For the Pareto and Pareto type 2 distributions, what is
typically referred to as the location parameter (the A
parameter) is not a location parameter in the technical
sense that the relation
f(x;gamma,loc) = f((x-loc);gamma,0)
does not hold (it is a location parameter in the sense
that it defines a lower bound for the Pareto, but not the
Pareto type 2, distribution).
For this reason, we modified the Dataplot definition to
treat A as a second shape parameter. For example, the
Pareto PDF function is
PARPDF(x,gamma,a,loc,scale)
The A, LOC, and SCALE parameters are optional (A will
default to 1 if not given).
F. The following enhancements were made to the probability
plot and ppcc/ks plots.
Note that both the probability plot and the ppcc plot
ultimately depend on computing the percent point function
for the specified distribution. If the percent point function
is fast to compute (e.g., if it exists as a simple, closed
formula), then these plots can be generated rapidly even if the
number of data points is large. On the other hand, some percent
point functions can require a good deal of computation. For
example, some distributions compute the cumulative distribution
function via numerical integration and then compute the percent
point function by inverting the cumulative distribution
function. In these cases, the ppcc/ks plots can take too long
to generate to be practical (this tends to be less of an issue
with probability plots).
1. The following commands can be used to control how many
points are used to generate probability and ppcc/ks
plots, respectively:
SET PROBABILITY PLOT DATA POINTS
SET PPCC PLOT DATA POINTS
The algorithm is to compute equally spaced
percentiles of the full data set and then use these
percentiles in generating the probability and
ppcc/ks plot.
Using this command involves a trade-off between speed
and accuracy. For distributions with simple, closed
formualas or fast approximations for the percent point
function, there is little reason not to use the full data
set. However, for many distributions, the ppcc plot or
ks plot can become impractical as the number of data points
increases.
The minimum number of points is 20. The number of
points is typically set between 50 and 100. You may
want to use less than 50 points for a few distributions
with particularly expensive percent point functions.
For distributions with only moderately expensive percent
point functions, you may want to go as high as 100 or
200.
2. For the ppcc (or ks) plot, each point on the plot
represents one underlying probability plot (which in
return requires n, where n is the sample size, computations
of the percent point function. For distributions with
one shape parameter, Dataplot typically uses 50 points
(i.e., there are 50 underlying probability plots
computed). For two shape parameters, Dataplot typically
uses between 20 and 50 values for each shape parameter.
It decreases the number of values used when the percent
point function is expensive to compute.
The following command allows you to explicitly specify
how many probability plots are generated by the ppcc plot:
SET PPCC PLOT AXIS POINTS
with and denoting the number of values
to use for the first and second shape parameters,
respectively. Specifying is optional.
Set these values to 0 in order to revert to the Dataplot
default.
There are actually two reasons for using this command.
If the percent point function is fast to compute (e.g.,
the Weibull distribution), you may want to increase the
number of points in order to generate a finer grid. On
the other hand, if the percent point function is
expensive to compute, you may want to decrease the
number of points to speed up the generation of the plot.
3. If the ppcc (or ks) plot has two shape parameters, then
the default graphical format is to plot the ppcc (or
ks) value on the y-axis. Each curve on the plot
represents one value of one shape parameter while the
value of the x-axis coordinate represents the value of
the other shape parameter. To reverse the roles of the
shape parameters, enter the command
SET PPCC PLOT AXIS ORDER REVERSE
To restore the default, enter
SET PPCC PLOT AXIS ORDER DEFAULT
4. The PPCC PLOT will write the following to the file
dpst2f.dat (in the current directory):
PPCC LOCATION SCALE SHAPE1 SHAPE2
VALUE PARAMETER PARAMETER PARAMETER PARAMETER
This can be useful for plotting how the estimate of location
and scale change as the shape parameter changes. In some
cases, a less optimal value of the shape parameters may
be preferred if it generates more realistic estimates for
location and scale.
5. The PROBABILITY PLOT and PPCC PLOT were updated to support
multiply censored data.
The syntax is
CENSORED PROBABILITY PLOT Y X
CENSORED PPCC PLOT Y X
The X variable identifies which points represent failure
and which represent censoring times. Specifically,
X = 1 implies a failure time and X = 0 represents a
censoring time. The word CENSORED is required to
distinguish this syntax from the syntax for binned
data. Censored probability plots and censored ppcc
plots do not apply to binned data.
Dataplot supports two algorithms for determining plot
coordinates for a censored probability plot.
i. The uniform order statistic medians are generated
based on the full sample size. However, only
values that represent a failure time are actually
plotted.
ii. Instead of uniform order statistic medians, the
plotting positions for the failure times are
computed using the Kaplan-Meier product limit
estimate:
U(i) = ((n+0.7)/(n+0.4))*
PRODUCT[q=1 to i][(n-q+0.7)/(n-q+1.7)]
with n denoting the full sample size and q denoting
failure times only. The theoretical quantile is then
the percent point function of U(i).
The censored ppcc plot is then based on the correlation
coefficient of the censored probability plot.
To specify which censoring algorithm to use, enter the
commands
SET CENSORED PROBABILITY PLOT
SET CENSORED PPCC PLOT
The default is to use the uniform order statistic medians
algorithm.
G. The following enhancements were made to the
Kolmogorov-Smirnov goodness of fit command and the KS PLOT.
plot and ppcc/ks plots.
1. The KS PLOT for the binned case ( KS PLOT Y X) now
automatically plots the chi-square goodness of fit
statistic rather than the Kolmogorov-Smirnov goodness of
fit statistic. This is done since the chi-square goodness
of fit is expliticly based on binned data. Note that
bins with a size less than 5 are automatically combined
so that the minimum bin size is at least 5.
2. The KS PLOT will write the following to the file
dpst2f.dat (in the current directory):
PPCC LOCATION SCALE SHAPE1 SHAPE2
VALUE PARAMETER PARAMETER PARAMETER PARAMETER
This can be useful for plotting how the estimate of location
and scale change as the shape parameter changes. In some
cases, a less optimal value of the shape parameters may
be preferred if it generates more realistic estimates for
location and scale.
2) The following graphics commands were added.
a. Univariate average shifted histograms can be generated with
the command:
ASH HISTOGRAM Y
3) The following analysis commands were added.
a. Cochran's test can be performed with the command
COCHRAN TEST Y X
where Y is a response variable and X is a group identifier
variable. Cochran's test is an alternative to the
Kruskal-Wallis test when the response variable is dichotomous
(i.e., only 2 possible values).
b. The Kruskal-Wallis test was enhanced to write the pairwise
multiple comparisons to the file dpst1f.dat.
c. Van Der Waerden's test can be performed with the command
VAN DER WAERDEN TEST Y X
where Y is a response variable and X is a group identifier
variable. Van Der Waerden's test is an alternative to
KRUSKAL WALLIS that is based on normal scores of the ranks.
4) The following statistics and LET subcommands were added.
a. Kendell's tau can be computed with the command
LET A = KENDELL TAU Y1 Y2
b. For the chi-square goodness of fit, it is generally advisable
to combine bins with small counts (typically, 5 is recommended
as a minimum bin size). To convert equal width bins to
variable width bins with a minimum bin count, enter the
commands
LET MINSIZE =
LET Y2 XLOW XUPPER = Y X
c. The commands
LET Y2 X2 = ASH BINNED Y
LET Y2 X2 = COUNTS ASH BINNED Y
generate frequency tables based on the average shifted
histogram (see ASH HISTOGRAM above). The first syntax returns
the relative frequency while the second syntax returns a
count.
5) The following enhancements were made to the READ command.
a. In previous versions of Dataplot, if your data set contained
rows with an unequal number of columns, Dataplot would only
read the number of variables corresponding to the row
with the minimum number of columns.
If you would like Dataplot to pad missing columns with a
missing value, enter the command
SET READ PAD MISSING COLUMNS ON
For example, if you enter the command
READ FILE.DAT X1 X2 X3 X4 X5
then rows with less than five columns will set the missing
rows to a missing value. To set the numeric value that
represents a missing value, enter
SET READ MISSING VALUE
where denotes the desired numeric value.
To reset the default behavior, enter the command
SET READ PAD MISSING COLUMNS OFF
In some cases, missing columns would be indicative of an
error in the data file.
b. The SUBSET/EXCEPT/FOR clause on a READ command was ambiguous.
The ambiguity aries from the fact that it is not clear whether
the SUBSET/EXCEPT/CLAUSE command refers to the lines in the
data file being read or to the output variables that are
created by the READ command. We address this with the
following command:
SET READ SUBSET
In this command, PACK means the SUBSET/EXCEPT/FOR clause
does not apply while DISPERSE means that it does. The
first setting applies to the input file while the second
setting applies to the created data variables.
This is demonstrated with the following example (note that
P-D means the data file is set to PACK and the output
variable is set to DISPERSE). The first column is the
data in the file while the remaining columns show what
the resulting data variable should look like.
READ FILE.DAT X FOR I = 1 2 10
X P-D P-P D-P D-D
===========================================
1 1 1 1 1
2 0 2 3 0
3 2 3 5 3
4 0 4 7 0
5 3 5 9 5
6 0 6 - 0
7 4 7 - 7
8 0 8 - 0
9 5 9 - 9
10 - 10 - -
The default setting is PACK-DISPERSE (this is the default
because this is the behavior of previous versions of Dataplot).
6) Miscellaneous Updates
a. Added the command
SET POSTSCRIPT DEFAULT COLOR
Postscript devices can be either black and white or color.
Dataplot assumes black and white by default. After the
DEVICE <2/3> POSTSCRIPT command, you can enter
DEVICE <2/3> COLOR ON
Although this works fine for DEVICE 2, it presents
complications for DEVICE 3 (this is the device used by the
PP command to print the current graph to a Postscript
printer). Dataplot opens/closes this device as needed
without the user entering any commands. It can be
difficult to determine when to insert a DEVICE 3 COLOR ON
command.
If you enter
SET POSTSCRIPT DEFAULT COLOR ON
then Dataplot will assume Postscript devices are color
(this applies to both DEVICE 2 and DEVICE 3, although it
is primarily motivated for DEVICE 3 output).
b. The default algorithm for class width in Dataplot is to
use 0.3*s where s is the sample standard deviation.
A number of different algorithms have been proposed to
obtain "optimal" class widths. The command
SET HISTOGRAM CLASS WIDTH
can be used to specify the default class width that Dataplot
will use for the HISTOGRAM and ASH HISTOGRAM commands.
Additional choices may be added in future releases.
The current choices are:
DEFAULT - use 0.3*s
SD - use 0.3*s
NORMAL - use 2.5*s/n**(1/3)
NORMAL CORRECTED - start with 2.5*s/n**(1/3). If the
skewness is between 0 and 3, multiply
this by the correction factor:
1/(1 - 0.006*skew + 0.27*skew**2 -
0.0069*skew**3).
If the kurtosis - 3 is between 0 and 6,
multiply by the correction factor:
1 - 0.2*(1 - EXP(-0.7*(kurt - 3)))
IQ - use 2.603*IQ/N**(1/3) where IQ is the
interquartile range
The NORMAL width is an optimal choice (in the sense of
minimizing the integrated mean square error of the histogram)
if the data is in fact normal. The NORMAL CORRECTED provides
correction factors for moderate skewness and kurtosis. The
IQ replaces s with a robust estimate of scale (the
interquartile range) and should provide a reasonable bin width
for a wide range of underlying distributions.
Since the "optimal" choice of bin width is dependent on
the underlying distribution of the data, it is difficult
to provide a default bin width that will work well in all
cases (we are typically using the histogram to help determine
what that underlying distribution actually is).
An explicit CLASS WIDTH command will override the default
class width algorithm.
c. For the chi-square goodness of fit test, it is usually
recommended that classes with less than 5 observations be
combined in order to obtain a reasonably accurate
approximation. Given data that is binned into equal size
bins, you can automatically combine bins with small
frequencies with the commands
LET MINSIZE =
LET Y3 XLOW XHIGH = COMBINE FREQUENCY TABLE Y2 X2
The variables XLOW and XHIGH will contain the lower and upper
boundary values for the classes (since bins will no longer be
of equal length), respectively. The value for MINSIZE defines
the minimum frequency for a class (it defaults to 5).
You can then generate a chi-square goodness of fit test
with the command
CHISQUARE GOODNESS OF FIT Y3 XLOW XHIGH
A typical sequence of commands for generating a chi-square
goodness of fit test for a discrete distribution, starting
from raw data, is
LET AMIN = MINIMUM Y
LET AMAX = MAXIMUM Y
CLASS LOWER AMIN
CLASS UPPER AMAX
CLASS WIDTH 1
LET Y2 X2 = BINNED Y
LET MINSIZE = 5
LET Y3 XLOW XHIGH = COMBINE FREQUENCY TABLE Y2 X2
CHISQUARE GOODNESS OF FIT Y3 XLOW XHIGH
d. The CORRELATION MATRIX and COVARIANCE MATRIX compute the
correlation and covariance matrices, respectively, of the
columns of a matrix. If you would like these to be
generated from the rows of the matrix, you can enter the
commands
SET CORRELATION MATRIX DIRECTION ROW
SET COVARIANCE MATRIX DIRECTION ROW
To reset to the columns, enter
SET CORRELATION MATRIX DIRECTION COLUMN
SET COVARIANCE MATRIX DIRECTION COLUMN
7) Bug Fixes:
a. There was a bug reading numbers of the form
-.23
In this case, the minus sign was being lost. You can
work around this by entering the number as
-0.23
This bug is fixed in the current version.
NOTE: This bug was introduced in the 1/2004 version.
b. There was a bug reading rows containing a single character.
This has been fixed. If you encounter this bug, you can
work around it by inserting a leading space in the data
file.
NOTE: This bug was introduced in the 1/2004 version.
c. The SET commands that accepted file names as arguments did
not support quoting. Enclosing the file name in quotes is
required when the file names contains spaces or hyphens.
This has been corrected.
d. There was a bug in the SUMMARY command where in some cases
it did not extract the correct data. This has been fixed.
e. There was a bug in the KAPLAN MEIER PLOT command that caused
the censoring variable to not be recognized. This has been
corrected.
-------------------------------------------------------------------------
The following enhancements were made to DATAPLOT February - May 2004.
-------------------------------------------------------------------------
1) The following updates were made for probability distributions.
a. Support was added for the following new distributions.
Log-skew-normal distribution:
LET A = LSNCDF(X,LAMBDA,SD) - cdf of log-skew-normal
distribution
LET A = LSNPDF(X,LAMBDA,SD) - pdf of log-skew-normal
distribution
LET A = LSNPPF(P,LAMBDA,SD) - ppf of log-skew-normal
distribution
Log-skew-t distribution:
LET A = LSTCDF(X,NU,LAMBDA,SD) - cdf of log-skew-normal
distribution
LET A = LSTPDF(X,NU,LAMBDA,SD) - pdf of log-skew-normal
distribution
LET A = LSTPPF(P,NU,LAMBDA,SD) - ppf of log-skew-normal
distribution
G-and-H distribution:
LET A = GHCDF(X,G,H) - cdf of g-and-h distribution
LET A = GHPDF(X,G,H) - pdf of g-and-h distribution
Note that the ppf function was added in a previous update.
Hermite distribution:
LET A = HERCDF(X,A,B) - cdf of Hermite distribution
LET A = HERPDF(X,A,B) - pdf of Hermite distribution
LET A = HERPPF(P,A,B) - ppf of Hermite distribution
Yule distribution:
LET A = YULCDF(X,P) - cdf of Yule distribution
LET A = YULPDF(X,P) - pdf of Yule distribution
LET A = YULPPF(P,P) - ppf of Yule distribution
b. The following pdf functions were added (these distributions
previously supported the cdf and ppf functions).
LET A = NCTPDF(X,NU,LAMBDA) - pdf of non-central t
LET A = DNTPDF(X,NU,L1,L2) - pdf of doubly non-central t
LET A = NCCPDF(X,NU,LAMBDA) - pdf of non-central chi-square
LET A = NCFPDF(X,NU1,NU2,L1) - pdf of non-central F
LET A = DNFPDF(X,NU1,NU2,L1,L2) - pdf of doubly non-central F
LET A = NCBPDF(X,A,B,LAMBDA) - pdf of non-central Beta
These pdf functions are computed by taking the numerical
derivative of the corresponding cdf function. You may
at times get warning messages that the derivative has not
converged with sufficient accuracy (this occurs most frequently
with the non-central Beta distribution).
c. The following enhancements were made to maximum likelihood
estimation.
1. The binomial case now generates lower and upper confidence
limits based on the Agresti and Coull approximation.
2. The lognormal case now generates confidence limits for
the shape and scale parameters.
3. Support was added for the following distributions:
LOGARITHIC SERIES MAXIMUM LIKELIHOOD Y
GEOMETRIC MAXIMUM LIKELIHOOD Y
BETA BINOMIAL MAXIMUM LIKELIHOOD Y
NEGATIVE BINOMIAL MAXIMUM LIKELIHOOD Y
HYPERGEOMETRIC MAXIMUM LIKELIHOOD Y
HERMITE MAXIMUM LIKELIHOOD Y
YULE MAXIMUM LIKELIHOOD Y
FATIGUE LIFE MAXIMUM LIKELIHOOD Y
GEOMETRIC EXTREME EXPONENTIAL MAXIMUM LIKELIHOOD Y
FOLDED NORMAL MAXIMUM LIKELIHOOD Y
CAUCHY MAXIMUM LIKELIHOOD Y
4. For the Johnson SU/SB distribution, a percentile
estimator is now available (a method of moments
estimator was previously available):
JOHNSON PERCENTILE Y
Note that this estimator will automatically determine
whether a SB or SU estimator is appropiate. Also, you
can define a constant Z used by this estimator by
entering the command (before the JOHNSON PERCENTILE
command):
LET Z =
This value is typically set between 0.5 and 1 with a
default value of 0.54. As the sample size gets larger,
then values of Z closer to 1 are appropriate (e.g.,
for a sample of size 1,000, a value of 0.8 works well).
5. Support for Latex and HTML output was added to most
supported distributions.
d. The following random number generators were added:
LET NU =
LET LAMBDA =
LET Y = NONCENTRAL T RANDOM NUMBERS FOR I = 1 1 N
LET NU =
LET LAMBDA1 =
LET LAMBDA2 =
LET Y = DOUBLY NONCENTRAL T RANDOM NUMBERS FOR I = 1 1 N
LET NU =
LET LAMBDA =
LET Y = NONCENTRAL BETA RANDOM NUMBERS FOR I = 1 1 N
LET GAMMA =
LET Y = GENERALIZED LOGISTIC RANDOM NUMBERS FOR I = 1 1 N
LET GAMMA =
LET Y = GENERALIZED HALF-LOGISTIC RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET BETA =
LET Y = HERMITE RANDOM NUMBERS FOR I = 1 1 N
LET P =
LET Y = YULE RANDOM NUMBERS FOR I = 1 1 N
LET A =
LET C =
LET Y = WARING RANDOM NUMBERS FOR I = 1 1 N
LET A =
LET B =
LET C =
LET Y = GENERALIZED WARING RANDOM NUMBERS FOR I = 1 1 N
The t, F, and chi-square random number generators were
updated to accept non-integer values for the degrees of
freedom parameters.
e. The following additions were made to the probability plot,
Kolmogorov-Smirnov goodness of fit, chi-sqaure goodness of
fit, and ppcc plot commands:
LET LAMBDA =
LET SD =
LOG SKEW NORMAL PROBABILITY PLOT Y
LOG SKEW NORMAL KOLMOGOROV-SMIRNOV GOODNESS OF FIT Y
LOG SKEW NORMAL CHI-SQUARE GOODNESS OF FIT Y
LOG SKEW NORMAL PPCC PLOT Y
LET LAMBDA =
LET SD =
LET NU =
LOG SKEW T PROBABILITY PLOT Y
LOG SKEW T KOLMOGOROV-SMIRNOV GOODNESS OF FIT Y
LOG SKEW T CHI-SQUARE GOODNESS OF FIT Y
LET G =
LET H =
G AND H PROBABILITY PLOT Y
G AND H KOLMOGOROV-SMIRNOV GOODNESS OF FIT Y
G AND H CHI-SQUARE GOODNESS OF FIT Y
G AND H PPCC PLOT Y
LET ALPHA =
LET BETA =
HERMITE PROBABILITY PLOT Y
HERMITE CHI-SQUARE GOODNESS OF FIT Y
HERMITE PPCC PLOT Y
LET P =
YULE PROBABILITY PLOT Y
YULE CHI-SQUARE GOODNESS OF FIT Y
YULE PPCC PLOT Y
f. The Anderson Darling test was updated to support the
generalized Pareto distribution:
ANDERSON-DARLING GENERALIZED PARETO TEST Y
The maximum likelihood estimation for the generalized
Pareto is still undergoing algorithmic development, so
you should specify the shape and scale parameter for
the generalized Pareto (before invoking the Anderson-Darling
test) as follows:
LET GAMMA =
LET A =
g. An optional definition was added for the geometric
distribution.
The default defintion for the geometric distribution is the
number of failures before the first success is obtained in
a sequence of Bernoulli trials. The alternate definition
is the number of trials up to and including the first
success in a series of Bernoulli trials. This definition
simply shifts the geometric distribution to start at X = 1
rather than X = 0.
To specify the alternate definition, enter the command
SET GEOMETRIC DEFINITION DLMF
To restore the default definition, enter the command
SET GEOMETRIC DEFINITION JOHNSON AND KOTZ
h. The negative binomial was updated to support non-integer
arguments for the number of failures shape parameter
(i.e., k).
i. A number of bug fixes and algorithmic improvements were made
for the ppcc plots with two shape parameters and the random
number generation for a few distributions.
2. The following enhancements were made to the PPCC PLOT and
PROBABILITY PLOT commands.
a. For some long tailed distributions, there can be large
variability in the tails. This can distort the estimates
of location, PPA0, and scale, PPA1, of the line fitted
to the probability plot. To address this, Dataplot now
also returns PPA0BW and PPA1BW. These are the estimates
obtained by performing two iterations of biweight
weighting of the residuals.
In most cases, the use of PPA0 and PPA1 is preferred.
However, if the probability plot indicates the prescence
of extreme outliers in the tails, PPA0BW and PPA1BW may
provide better estimates for the location and scale
parameters.
b. The following command was added as a variant of the
ppcc plot:
KS PLOT Y
where is any of the distributions supported by
the PPCC PLOT command.
This plot uses a similar concept to the ppcc plot.
However, it uses the value of the Kolmogorov-Smirnov
goodness of fit statistic rather than the correlation
coefficient of the probability plot as the measure
of distributional fit. In this, the goal is to minimize
the Kolmogorov-Smirnov goodness of fit statistic.
Although we are still developing experience with this
plot, a few prelimary recommendations are:
1. For most continuous distributions with one shape
parameter, the PPCC PLOT and KS PLOT generate similar
estimates for the shape parameter.
2. The KS PLOT seems to perform better for at least some
distributions with two shape parameters.
3. The KS PLOT generates a smoother plot for discrete
distributions.
For additional information, enter
HELP KS PLOT
c. For the PPCC PLOT and KS PLOT, the following command
allows you to specify the desired format for the
plot when there are two shape parameters:
SET PPCC FORMAT
For the default setting, TRACE, these plots are generated
as a multi-trace 2D plot. That is, the Y axis will
represent the correlation (or value of the
Kolmogorov-Smirnov statistic), the X axis will represent
the value of the second shape parameter, and each trace
will represent one of the values for the first shape
parameter.
If this value is set to 3D, the plot is represented as
a 3D surface plot.
3. Sometimes data may only be available in the form of a frequency
table. However, some Dataplot commands may expect the data
in a "raw" format. The following command was added to convert
frequency data to raw data:
LET Y = FREQUENCY TO RAW X FREQ
For example,
X FREQ
--------
0 3
1 2
2 4
would be converted to
0
0
0
1
1
2
2
2
2
-------------------------------------------------------------------------
The following enhancements were made to DATAPLOT June 2003-January 2004.
-------------------------------------------------------------------------
1) The following enhancements were made to the Dataplot I/O
capabilities.
a) Previously, the Dataplot READ command was updated to
handle the syntax
READ FILE.DAT
In this case, Dataplot simply assigns the names X1, X2,
and so on to the variables. Many packages accept data
files where the first line contains the variable names.
To support this in Dataplot, do the following:
SET READ VARIABLE LABEL ON
READ FILE.DAT
In this case, Dataplot will interpret the first line
read as the variable names in the file.
b) Dataplot has previously not supported reading character
variables in data files (with the one execption of READ ROW
LABELS). If encountered, Dataplot would generate an error
message and not read the data file correctly. To address
this, we have added the command
SET CONVERT CHARACTER
Setting this to ERROR will continue the current Dataplot
action of reporting an error. This is recommended for the
case when a file is suppossed to contain only numeric data
and the presence of character data is in fact indicative
of an error in the data file. Setting this to IGNORE will
instruct Dataplot to simply ignore any fields containing
character data. Setting this to ON will read character fields
and write them to the file "dpzchf.dat".
There are some restrictions on when Dataplot will try to
read character data:
1) This only applies to the variable read case. That
is, READ PARAMETER and READ MATRIX will ignore
character fields or treat them as an error.
2) Dataplot will only try to read character data from
a file. When reading from the keyboard (i.e., when
READ is specified with no file name), character data
will be ignored when a SET CONVERT CHARACTER ON is
specified.
3) This capability is not supported for the SERIAL READ
case.
4) The SET READ FORMAT command does not accept the
"A" format specification for reading character
fields.
Some of these restrictions may be addressed in subsequent
releases of Dataplot.
Enter HELP CONVERT CHARACTER for details.
c) The COLUMN LIMITS command has been updated to accept
variable arguments. For example,
COLUMN LIMITS LOWER UPPER
with LOWER and UPPER denoting variables (as oppossed to
parameters) each with N elements. Dataplot will parse
the data file assuming that field one of the data is in
columns LOWER(1) to UPPER(1), field two of the data is
in LOWER(2) to UPPER(2) and so on. Note that only one
numeric or character variable will be read in each field.
Many programs, Excel for example, will write data to ASCII
files with the data values either left or right justified
to a given column. If the ASCII file is written so that
the decimal point is in a fixed column, then using the
SET READ FORMAT is typically recommended rather than
the COLUMN LIMITS with variable arguments.
If the data file contains columns of equal length, then
using this form of the COLIMNM LIMITS command is not
necessary. However, there are two cases where it is useful:
1) If you only want to read selected fields in the data
file, then this form of the COLUMN LIMITS command
easily allows you to do this.
2) If the data columns are of unequal length, as ASCII
files created from Excel often are, then this form
of the COLUMN LIMITS allows these data files to be
read correctly. If a given field is empty, Dataplot
interprets it as a missing value.
By default, Dataplot will set the missing value to 0.
If you would like to specify a value other than zero,
then enter the command
SET READ MISSING VALUE
where is the desired value.
Enter HELP COLUMN LIMITS for details.
d) If Excel writes a comma delimited ASCII file (.CSV), then
missing values are denoted with ",,". In order to interpert
these files correctly, you can enter the command
SET READ DELIMITER
where specifies the desired delimiter. The default
delimiter is a comma.
If Dataplot encounters the delimiter before any valid data
has been found, it interprets this as a missing value.
Missing values are set to 0 unless a SET READ MISSING VALUE
command has been entered (see above).
We have added a section in the online help files that provides
general guidance on reading ASCII data files in Dataplot.
This consolidates information documented under a number of
different commands. For details, enter
HELP ASCII FILES
2) The SET CONVERT CHARACTER ON command allows you to read
character variables. We have added the following commands
that operate on these character variables.
a) Many character variables are in fact group-id variables.
In order to allow you to use these group-id variables
in a numeric context, the following two commands were added:
LET Y = CHARACTER CODE IX
LET Y = ALPHABETIC CHARACTER CODE IX
with IX denoting the name of a character variable that
has been read into Dataplot and Y denoting the name of a
numeric variable that will be created by this command.
Both of these commands identify the unique rows in the
character variable (Dataplot checks for exact matches, it
does not try to guess if a typo has occurred, etc.). If
there are K unique rows, Dataplot will generate coded values
as the integer values from 1 to K. The distinction is that
CHARACTER CODE will perform the coding in the order that the
unique rows are encoutered in the file while ALPHABETIC
CHARACTER CODE will sort the unique character rows and
code based on the alphabetic order.
b) Character variables are frequently used as group-id
variables (e.g., Male and Female to identify sex). The
following command creates a group-id variable from a
character variable:
LET IG = GROUP LABELS MONTH
with MONTH denoting the name of a character variable.
The name IG will be used to denote a group-id variable.
The number of rows in IG will be equal to the number of
unique rows in MONTH. Up to 5 group-id variables can be
created and the maximum number of rows for a group-id
variable is the maximum number of rows for a numeric
variable divided by 100.
c) You can create a row label variable with the READ ROW LABEL
command. Alternatively, you now enter the command
LET ROWLABEL = MONTH
with MONTH denoting the name of a character variable.
Note that the variable name on the left hand side of the
"=" must be ROWLABEL for this command to work.
d) The TIC MARK LABEL FORMAT and TIC LABEL CONTENT commands
have been updated to suppor the following:
TIC MARK LABEL FORMAT GROUP LABEL
TIC MARK LABEL CONTENT IG
TIC MARK LABEL FORMAT ROW LABEL
TIC MARK LABEL FORMAT VARIABLE
TIC MARK LABEL CONTENT YVAR
Setting the tic mark label format to GROUP LABEL instructs
Dataplot to use a group label variable for the contents
of the tic mark label. The TIC MARK LABEL CONTENT command
is then used to specify the name of the group label variable
to use.
Setting the tic mark label format to VARIABLE is similar to
the GROUP LABEL case. However, in this case a numeric
variable is specified rather than a group label variable.
This allows you to place your own numeric tic mark labels.
For example, you can use this to generate a "reverse" axis.
Setting the tic mark label format to ROW LABEL allows you
to use the row labels as the content for the tic mark labels.
For example, this can be useful for labeling a bar chart.
3) Support for the following univariate distributions was added:
LET A = TRACDF(X,A,B,C,D) - cdf of trapezoid distribution
LET A = TRAPDF(X,A,B,C,D) - pdf of trapezoid distribution
LET A = TRAPPF(P,A,B,C,D) - ppf of trapezoid distribution
LET A = GTRCDF(X,A,B,C,D,NU1,NU3,ALPHA) - cdf of generalized
trapezoid distribution
LET A = GTRPDF(X,A,B,C,D,NU1,NU3,ALPHA) - pdf of generalized
trapezoid distribution
LET A = GTRPPF(P,A,B,C,D,NU1,NU3,ALPHA) - ppf of generalized
trapezoid distribution
LET A = FTCDF(X,NU) - cdf of folded t distribution
LET A = FTPDF(X,NU) - pdf of folded t distribution
LET A = FTPPF(P,NU) - ppf of folded t distribution
LET A = SNCDF(X,ALPHA) - cdf of skew normal distribution
LET A = SNPDF(X,ALPHA) - pdf of skew normal distribution
LET A = SNPPF(P,ALPHA) - ppf of skew normal distribution
LET A = STCDF(X,NU,ALPHA) - cdf of skew t distribution
LET A = STPDF(X,NU,ALPHA) - pdf of skew t distribution
LET A = STPPF(X,NU,ALPHA) - ppf of skew t distribution
LET A = SLACDF(X) - cdf of slash distribution
LET A = SLAPPF(P) - ppf of slash distribution
LET A = IBCDF(X,ALPHA,BETA) - cdf of inverted beta distribution
LET A = IBPPF(P,ALPHA,BETA) - ppf of inverted beta distribution
LET A = GHCDF(X,G,H) - cdf of g-and-h distribution
LET A = GHPPF(P,G,H) - ppf of g-and-h distribution
LET A = MAKCDF(X,XI,L,T) - cdf of Gompertz-Makeham distribution
LET A = MAKPDF(X,XI,L,T) - pdf of Gompertz-Makeham distribution
LET A = MAKPPF(P,XI,L,T) - ppf of Gompertz-Makeham distribution
LET A = GHPPF(P,G,H) - ppf of g-and-h distribution
LET A = ZIPPDF(X,ALPHA) - pdf of Zipf distribution
Note that the IBPDF and SLAPDF functions were implemented
previously. The GHPDF function is still under development.
You can generate random numbers for these distributions
with the commands
LET A =
LET B =
LET C =
LET D =
LET Y = TRAPEZOID RANDOM NUMBERS FOR I = 1 1 N
LET A =
LET B =
LET C =
LET D =
LET NU1 =
LET NU3 =
LET ALPHA =
LET Y = GENERALIZED TRAPEZOID RANDOM NUMBERS FOR I = 1 1 N
LET NU =
LET Y = FOLDED T RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET Y = SKEWED NORMAL RANDOM NUMBERS FOR I = 1 1 N
LET NU =
LET ALPHA =
LET Y = SKEWED T RANDOM NUMBERS FOR I = 1 1 N
LET G =
LET H =
LET Y = G AND H RANDOM NUMBERS FOR I = 1 1 N
LET XI =
LET LAMBDA =
LET THETA =
LET Y = GOMPERTZ-MAKEHAM RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET Y = ZIPF RANDOM NUMBERS FOR I = 1 1 N
Random numbers for the slash and inverted beta distributions
were added previously.
You can generate the following probability plots and goodness
of fit tests
LET A =
LET B =
LET C =
LET D =
TRAPEZOID PROBABILITY PLOT Y
TRAPEZOID KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
TRAPEZOID CHI-SQUARE GOODNESS OF FIT TEST Y
LET A =
LET B =
LET C =
LET D =
LET NU1 =
LET NU3 =
LET ALPHA =
GENERALIZED TRAPEZOID PROBABILITY PLOT Y
GENERALIZED TRAPEZOID KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
GENERALIZED TRAPEZOID CHI-SQUARE GOODNESS OF FIT TEST Y
LET NU =
FOLDED T PROBABILITY PLOT Y
FOLDED T KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
FOLDED T CHI-SQUARE GOODNESS OF FIT TEST Y
FOLDED T PPCC PLOT Y
LET NU =
LET LAMBDA =
SKEW T PROBABILITY PLOT Y
SKEW T KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
SKEW T CHI-SQUARE GOODNESS OF FIT TEST Y
SKEW T PPCC PLOT Y
LET LAMBDA =
SKEW NORMAL PROBABILITY PLOT Y
SKEW NORMAL KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
SKEW NORMAL CHI-SQUARE GOODNESS OF FIT TEST Y
SKEW NORMAL PPCC PLOT Y
LET G =
LET H =
G AND H PROBABILITY PLOT Y
G AND H KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
G AND H CHI-SQUARE GOODNESS OF FIT TEST Y
G AND H PPCC PLOT Y
LET XI =
LET LAMBDA =
LET THETA =
GOMPERTZ-MAKEHAM PROBABILITY PLOT Y
GOMPERTZ-MAKEHAM KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
GOMPERTZ-MAKEHAM CHI-SQUARE GOODNESS OF FIT TEST Y
c) Added the following commands
JOHNSON SU MOMENTS Y
JOHNSON SB MOMENTS Y
to compute method of moment estimates for the Johnson SU
and Johnson SB distributions.
d) The GUMBEL MAXIMUM LIKELIHOOD command was extended to support
both the minimum and maximum cases (the previous version was
restricted to the maximum case). Before the GUMBEL MAXIMUM
LIKELIHOOD command, enter the command
SET MINMAX 1
to specify the minimum case and
SET MINMAX 2
to specify the maximum case.
e) Enter the following command to generate Dirichelet random numbers:
LET M = DIRICHLET RANDOM NUMBERS ALPHA N
with ALPHA denoting a vector containing the shape parameters of
the Dirichlet distribution and N denoting a scalar that specifies
the number of rows to generate. M will be a matrix with N rows
and k columns (where k is the number of elements in the ALPHA
vector).
You can also compute the Dirichlet probability density or the
log of the Dirichlet probability density with the commands
LET M = DIRICHLET PDF X ALPHA
LET M = DIRICHLET LOG PDF X ALPHA
f) Enter the following command to generate correlated uniform
random numbers:
LET U = MULTIVARIATE UNIFORM RANDOM NUMBERS SIGMA N
with SIGMA denoting the variance-covariance matrix of
a multivariate normal distribution and N denoting the number
of rows to generate.
g) The Anderson-Darling goodnes of fit test was enhanced to
include the following distributions:
ANDERSON-DARLING LOGISTIC TEST Y
ANDERSON-DARLING DOUBLE EXPONENTIAL TEST Y
ANDERSON-DARLING UNIFORM TEST Y
The uniform case is for the uniform distribution on the
(0,1) interval. This can also be used for fully specified
distributions (i.e., the shape, location, and scale
parameters are not estimated from the data). Simply
calculate the appropriate CDF function with the specified
shape, location, and scale parameters (this converts the
data to the (0,1) interval) and apply the test for a
uniform distribution.
h) The following maximum likelihood estimation commands were
added:
LOGISTIC MAXIMUM LIKELIHOOD Y
UNIFORM MAXIMUM LIKELIHOOD Y
BETA MAXIMUM LIKELIHOOD Y
The BETA and UNIFORM cases generate both method of moments and
maximum likelihood estimates.
The beta case estimates the lower and upper limits of the
data from the minimum and maximam data values, respectively,
and then computes the maximum likelihood estimates for the
alpha and beta shape parameters.
i) Support was added for the following random number
generators:
1) FIBONACCI CONGRUENTIAL - a mixture of the Fibonnaci generator
with a congruential generator
2) MERSENNE TWISTER - Fortran 90 implementation of the
Mersenned twister generator (may not be
valid on platforms that are compiled
with Fortran 77 compilers)
Enter HELP RANDOM NUMBER GENERATOR for details.
j) Fixed the inverse gaussian and reciprocal inverse gaussian
probability functions. The MU parameter was treated as a
location parameter in original implementation. However, it
is really a shape parameter. So IGPDF and RIGPDF can now be
called via
IGPDF(X,GAMMA,MU,LOC,SCALE)
RIGPDF(X,GAMMA,MU,LOC,SCALE)
The MU parameter is treated as an optional parameter (LOC and
SCALE are also optional). MU is set to 1 if it is omitted.
The MU parameter can also be specified for random numbers
and probability plots. If the MU parameter is not set, it
will automatically be set to 1 (no error message is printed).
The PPCC plot for these two distributions is now generated for
both the gamma and mu parameters (i.e., a 3D plot is generated).
If you want the PPCC plot assuming MU =1 for the inverse
gaussian case, you can use the WALD PPCC PLOT command (the
Wald distribution is a special case of the inverse gaussian
where MU is set to 1).
4) Added the following analysis commands:
a) Support for linear and quadratic calibration is available via
the following commands:
LINEAR CALIBRATION Y X Y0
QUADRATIC CALIBRATION Y X Y0
The LINEAR CALIBRATION command performs a linear calibration
analysis using eight different methods. The QUADRATIC
CALIBRATION command performs a quadratic calibration analysis
using three different methods.
Enter HELP CALIBRATION for details.
b) The Friedman test for two-way analysis of variance on ranks
is supported with the command
FRIEDMAN TEST Y BLOCK TREATMENT
Enter
HELP FRIEDMAN TEST
for details.
c) The frequency and cumulative sum tests for randomness are
supported with the commands
FREQUENCY TEST Y
LET M =
FREQUENCY WITHIN A BLOCK TEST Y
CUMULATIVE SUM TEST Y
These tests are used for sequences of 0's and 1's (Dataplot
just checks for two distinct values, the higher value is
set to 1 and the lower value is set to 0).
To test a uniform random number generator, do something like
the following:
LET N = 1
LET P = 0.5
LET Y = BINOMIAL RANDOM NUMBERS FOR I = 1 1 10000
FREQUENCY TEST Y
For details, enter
HELP FREQUENCY TEST
HELP CUMULATIVE SUM TEST
5) The following enhancements were made to the BOOTSTRAP PLOT command
a) Extended the grouped case to handle two groups (previously
one group was supported).
b) For the grouped (either one or two groups), the following
information is written to file:
DPST1F.DAT - the full set of bootstrap estimates for the
statistic (group-id in column 1, bootstrap
statistic in column 2)
DPST2F.DAT - writes the group-id and the corresponding mean,
standard deviation, and the 0.025, 0.975, 0.05,
0.95, 0.0005, and 0.995 quantiles
c) Added the following form of the command
BCA BOOTSTRAP PLOT Y
This generates BCa bootstrap confidence intervals as defined
by Efron. At the expense of additional computation, it
generates bootstrap confidence intervals that are second order
accurate (the percentile bootstrap confidence intervals are
first order accurate).
Enter HELP BOOTSTRAP PLOT for further information.
6) The CAPTURE HTML (for generating Dataplot output in HTML format)
capability has been extended to additional analysis commands.
In addition, Dataplot output can now be generated in Latex format
with the command
CAPTURE LATEX file.tex
with "file.tex" denoting the name where the Latex output is
generated. An END OF CAPTURE terminates the generation of
Latex output.
The CAPTURE HTML and CAPTURE LATEX commands now generate formatted
output for the following commands:
SUMMARY
TABULATE
CROSS TABULATE
CONSENSUS MEAN
CONSENSUS MEAN PLOT
LINEAR CALIBRATION
QUADRATIC CALIBRATION
YATES ANALYSIS
FIT
ANOVA
FRIEDMAN TEST
WILK SHAPIRO
ANDERSON DARLING
KOLMOGOROV-SMIRNOV GOODNESS OF FIT
CHI-SQUARE GOODNESS OF FIT
EXPONENTIAL MAXIMUM LIKELIHOOD
GUMBEL MAXIMUM LIKELIHOOD
WEIBULL MAXIMUM LIKELIHOOD
LOGISTIC MAXIMUM LIKELIHOOD
PARETO MAXIMUM LIKELIHOOD
UNIFORM MAXIMUM LIKELIHOOD
BETA MAXIMUM LIKELIHOOD
CONFIDENCE LIMITS
DIFFERENCE OF MEANS CONFIDENCE LIMITS
BIWEIGHT LOCATION CONFIDENCE LIMITS
TRIMMED MEAN CONFIDENCE LIMITS
MEDIAN/QUANTILE CONFIDENCE LIMITS
T TEST
F TEST
CHI-SQUARE TEST
GRUBB TEST
LEVENE TEST
FREQUENCY TEST
FREQUENCY WITHIN A BLOCK TEST
CUSUM TEST
In addition, WRITE HTML and WRITE LATEX commands have been added
to allow the generation of one-way tables.
We plan to implement this capability for most of the analysis
commands over the course of the next year or so. In addition,
we are investigating a similar capability for Rich Text
Format (RTF), which would allow importation into Word and
other word processing programs.
Output from unsupported commands is enclosed in "" and
" " tags for HTML and within the "begin{\verbatin}"
environment for Latex. Enter
HELP HTML
HELP LATEX
for details.
7) Dataplot has previously supported a LET ... = DERIVATIVE ...
command that generates analytic derivatives. However, this was
supported for a rather limited set of functions (enter
HELP DERIVATIVE for details). We have added the commands
LET A = NUMERICAL DERIVATIVE F WRT X FOR X = X0
LET Y = NUMERICAL DERIVATIVE F WRT X
to compute derivatives numerically. The distinction in the
above syntax is that the first command computes a single
derivative while the second syntax computes the derivative
for a vector of values (define X to contain the points at
which you want the derivative computed). For details, enter
HELP NUMERICAL DERIVATIVE f
8) Fixed following bugs:
a) Fixed the READ and WRITE commands to handle hyphens inside
of quoted file names correctly (only applies if
SET FILE NAME QUOTE ON entered).
b) The substitution character, "^", was modified to treat
anything other than a letter, a number, or an underscore
as terminator for the Dataplot name. Note that although you
can use some special characters in Dataplot names, this
is strongly discouraged.
c) Fixed a bug where the file name restriction of 80 characters
was actually a restriction on the entire command line. This
has been fixed so that file name may be up to 80 characters
and the full command line may be more than 80 characters.
d) Fixed a bug with the CAPTURE FLUSH command.
e) If an improper format is given on the SET WRITE FORMAT,
Dataplot will now return an error message rather than
crashing.
f) Fixed a bug in the generation of non-central chi-square,
non-central F, and doubly non-central F random numbers.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT April-May 2003.
-----------------------------------------------------------------------
1) Added the following plot commands
PARALLEL COORDINATES PLOT Y1 ... YK
The parallel coordinates plot is a technique for plotting
multivariate data. Enter HELP PARALLEL COORDINATES PLOT
for details.
2) Added support for the following statistics:
LET A = SN SCALE Y1
LET A = QN SCALE Y1
LET A = DIFFERENCE OF SN Y1 Y2
LET A = DIFFERENCE OF QN Y1 Y2
LET P1 = 10
LET P2 = 10
Enter HELP for the given statistic for details (e.g.,
HELP DIFFERENCE OF SN).
In addition, these statistics are supported for the following
plots and commands
STATISTIC PLOT Y1 Y2 X
CROSS TABULATE STATISTIC PLOT Y1 Y2 X1 X2
BOOTSTRAP PLOT Y1 Y2 X1 X2
JACKNIFE PLOT Y1 Y2 X1 X2
TABULATE Y1 Y2 X
CROSS TABULATE Y1 Y2 X1 X2
LET Z = CROSS TABULATE Y1 Y2 X1 X2
The DIFFERENCE OF COUNTS statistic is not supported for these
plots and commands (since it will simply be zero for all
cases).
The SN SCALE and QN SCALE statistics are also supported for
the following additional commands
DEX PLOT Y X1 ... XK
BLOCK PLOT Y X1 ... XK
INFLUENCE CURVE Y
INTERACTION PLOT Y X1 X2
LET Y = MATRIX COLUMN M
LET Y = MATRIX ROW M
3) The following probability distribution commands were added:
a) The following commands for multivariate random numbers
were added:
LET W = WISHART RANDOM NUMBERS MU SIGMA N
LET U = INDEPENDENT UNIFORM RANDOM NUMBERS LOWL UPPL NP
LET M = MULTIVARIATE T RANDOM NUMBERS MU SIGMA NU N
LET M = MULTINOMIAL RANDOM NUMBERS P N NEVENTS
For details, enter
HELP WISHART RANDOM NUMBERS
HELP INDEPENDENT UNIFORM RANDOM NUMBERS
HELP MULTIVARIATE T RANDOM NUMBERS
HELP MULTINOMIAL RANDOM NUMBERS
b) The following multivariate cumulative distribution and
probability density/mass function commands were added:
LET M = MULTIVARIATE NORMAL CDF SIGMA UPPL
LET M = MULTIVARIATE NORMAL CDF SIGMA LOWL UPPL
LET M = MULTIVARIATE T CDF SIGMA UPPL
LET M = MULTIVARIATE T CDF SIGMA LOWL UPPL
LET M = MULTINOMIAL PDF X P
These compute the cdf for multivariate normal and
multivariate t distributions and the pdf for the multinomial
distribution. For details, enter
HELP MULTIVARIATE NORMAL CDF
HELP MULTIVARIATE T CDF
HELP MULTINOMIAL PDF
c) Support for the following univariate distributions was
added:
LET A = LANCDF(X) - cdf of Landau distribution
LET A = LANPDF(X) - pdf of Landau distribution
LET A = LANPPF(P) - ppf of Landau distribution
LET A = LANDIF(X) - derivative of Landau pdf
LET A = LANXM1(X) - first moment function of
Landau distribution
LET A = LANXM2(X) - second moment function of
Landau distribution
LET A = ERRCDF(X,ALPHA) - cdf of error distribution
LET A = ERRPDF(X,ALPHA) - pdf of error distribution
LET A = ERRPPF(X,ALPHA) - ppf of error distribution
LET A = SLAPDF(X) - pdf of slash distribution
LET A = IBPDF(X,ALPHA) - pdf of inverted beta distribution
The cdf and ppf functions for the slash and inverted
beta distributions are still being developed.
You can generate random numbers for these distributions
with the commands
LET Y = LANDAU RANDOM NUMBERS FOR I = 1 1 N
LET Y = SLASH RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET Y = ERROR RANDOM NUMBERS FOR I = 1 1 N
LET ALPHA =
LET Y = INVERTED BETA RANDOM NUMBERS FOR I = 1 1 N
The error distribution is also referred to as the
Subbotin, exponential power, or general error distribution.
There are several different parameterizations of this
distribution. Dataplot uses the parameterization of
Tadikamalla in "Random Sampling From the Exponential
Power Distribution", Journal of the American Statistical
Association, September, 1980. Enter HELP ERRPDF for
details.
d) Support was added for the following random number
generators:
1) GENZ - Alan Genz generator
2) LUXURY - based on the Marsagalia and Zaman
borrow-and-carry generator. Uses a code written
by F. James and incorporating improvements by
M. Luscher.
Enter HELP RANDOM NUMBER GENERATOR for details.
4) Added the following command:
LET Y2 X2 = STACK Y1 Y2 ... YK
This command appends the variables Y1, Y2, ..., YK into
the single variable Y2. In addition, X2 contains a
group identifier variable (values corresponding to Y1 are
set to 1, values corresponding to Y2 are set to 2, and so on).
Many Dataplot commands (e.g., BOX PLOT, MEAN PLOT, ANOVA)
require data be in the two-variable format (i.e., a response
variable and a group identifier variable). However, many
data files will simply have each response variable in a
separate column. The STACK command provides a convenient
way to generate the data in the form needed by many Dataplot
commands.
-----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January-March 2003.
-----------------------------------------------------------------------
1) The Windows 95/98/ME/NT/2000/XP installation now uses
InstallShield. This should simplify the installation of
Dataplot on Windows platforms.
2) A few tweaks were made to the Postscript device.
a) Previously, Dataplot started a new page when the device
was intialized. It also started a new page when the first
plot was generated. This was to ensure that a fresh
page was started if you were generating diagrammatic
graphics before the first plot. However, it caused
a blank page to be printed for most applications.
Dataplot now automatically keeps track so that the first
plot will not generate the unneeded page erase.
b) Previously, the LANDSCAPE WORDPERFECT orientation (this
results in a landscape orientation on a portrait page)
was supported for encapsulated Postscript, but not for
regular Postscript. This orientation is now supported
for regular Postscript.
c) Dataplot allows you to switch between the various
orientations (LANDSCAPE, PORTRAIT, LANDSCAPE WORDPERFECT,
SQUARE) when using Postscript. For this reason, it sets
the bounding box for an 11x11 inch page.
The following command
SET POSTSCRIPT BOUNDING BOX
can be used to modify this behavior. If the value is
FLOAT (the default), the bounding box is set for an
11x11 inch page. If the value is set to FIXED, the
bounding box will be set according to whatever the current
orientation is when the device is initialized. However,
you should not change the orientation if FIXED is used.
If you are simply using the Postscript output for printing,
then you do not need to worry about this command. However,
it may occasionally be useful if are importing the Postscript
output into an external program.
3) Postscript was added to the list of devices supported by
the CAPTURE HTML command (see 3) for the August-December 2002
updates).
If a DEVICE 2 CLOSE command is encountered when CAPTURE HTML
is on and the device is set to postscript, Dataplot will first
use Ghostscript to convert the Postscript output to JPEG.
The JPEG file will have the same file name as the original
postscript file, but its extension will be changed to "jpg"
(e.g., the default name "dppl1f.dat" results in a JPEG file
called "dppl1f.jpg"). Dataplot will add an "
For example, on my Windows system, I use
SET GHOSTSCRIPT PATH F:\GS\GS704\GS\BIN\
We suggest that you add this command to your Dataplot
startup file "dplogf.tex".
b. We suggest using either the ORIENTATION PORTRAIT or the
ORIENTATION LANDSCAPE WORDPERFECT command to set the
orientation. Plots with a landscape orientation are
rotated in the Dataplot Postscript output (in order to
make full use of the page). Currently, Ghostscript does
not support a command line switch to rotate the graph.
This means that landscape plots will be rotated vertically
on the web page (you can use an external program, GIMP for
example, to rotate the JPEG files if you like).
4) Dataplot uses a vector graphics model. However, when you want
to incorporate Dataplot graphics into other applications, it
is often preferrable to work with bitmapped graphics.
Dataplot now supports the command:
SET POSTSCRIPT CONVERT
where is one of the following:
JPEG - for jpeg
PDF - for Portable Document Format (PDF)
TIFF - for Tiff
PBM - PBM Portable Bit Map Format (supports black and white)
PGM - PBM Portable Grey Map Format (supports grey scale)
PPM - Portable Pixmap Format (supports color)
PNM - PBM Portable Anymap Format (operates on PBM, PGM, or
PPM formats)
If is set to one of the choices above, a DEVICE 2 CLOSE
command is encountered, and the device is set to postscript, Dataplot
first uses Ghostscript to convert the Postscript output to the
requested format. The converted file will have the same file name
as the original postscript file, but its extension will be changed to
"jpg", "pdf", "tif", "pbm", "pgm", "ppm", or "pnm" depending on
the value of . For example, if is "PDF", the default
name "dppl1f.dat" results in a PDF file called "dppl1f.pdf").
As noted above in 3), this option assumes Ghostscript is installed
on your local system. You can use the SET GHOSTSCRIPT PATH
described above to set the path for Ghostscript.
Also, as noted in 3), we suggest using either the ORIENTATION PORTRAIT
or the ORIENTATION LANDSCAPE WORDPERFECT command to set the
orientation.
A few additional points:
a. The original postscript file is not deleted. An additional
plot file, with a different extension, is created.
b. The bit map formats are generally most useful when there is
one image per file. You can do something like the following:
SET POSTSCRIPT CONVERT JPEG
SET IPL1NA plot1.ps
DEVICE 2 POSTSCRIPT
... generate plot 1 ...
DEVICE 2 CLOSE
SET IPL1NA plot2.ps
DEVICE 2 POSTSCRIPT
... generate plot 2 ...
DEVICE 2 CLOSE
This will result in the files plot1.ps, plot1.jpg, plot2.ps, and
plot2.jpg.
The PDF files may be an exception to this. Depending on how
you want to use the generated plots, you can either
create all the plots in a single PDF file or put each plot
in a separate PDF file (using the above logic).
c. If the CAPTURE HTML switch is on, PDF files are incorporated
into the generated HTML file. For PDF files, no file
conversion is performed. Instead, a link to the PDF file is
added to the HTML page.
The advantage of the PDF format over JPEG is that it is typically
of higher quality than the JPEG file. The disadvantage is that
you have to link to another page to view it.
5) The CAPTURE HTML command can be used to save Dataplot numeric
and graphics output in an HTML page. By default, Dataplot
generates fairly minimal "header" and "footer" HTML code
(basically, it sets a white background and not much else).
If your basic purpose is to simply create a web viewable page,
then this is sufficient. However, many sites have specific style
guidelines for web pages. These can typically be incorporated into
the "header" and "footer" of the HTML page.
In order to provide additional flexibility to the appearance
of the web pages created using CAPTURE HTML, Dataplot now
supports the following two commands:
SET HTML HEADER FILE
SET HTML FOOTER FILE
If these commands are given, Dataplot will add the contents of
to the beginning and the contents of
to the end of the generated HTML file.
The Dataplot HELP directory contains the files
"sed_header.htm" and "sed_footer.htm". These can be used as
examples for developing your own templates (these implement
some NIST specific information, so they are not intended to be
used directly by non-NIST users).
Note that Dataplot does no error checking on these files. We
recommend that you view a page containing the intended header
and footer to detect problems with your HTML code.
Dataplot will only read 240 characters per line in these file.
6) One current limitation in Dataplot has been that reading data
from ASCII files was limited to a maximum of 132 columns. The
only way arround this was to use the SET READ FORMAT. However,
this did not work if the data did not have a consistent format.
The default limit was raised to 255 columns. To read even
longer data lines, use the command MAXIMUM RECORD LENGTH.
Enter HELP MAXIMUM RECORD LENGTH for details.
7) The following commands were added:
TRIMMED MEAN CONFIDENCE LIMITS Y
MEDIAN CONFIDENCE LIMITS Y
These provide confidence intervals for robust estimates of
location. Enter
HELP TRIMMED MEAN CONFIDENCE LIMITS
HELP MEDIAN CONFIDENCE LIMITS
for details.
8) The following plot commands were added:
VIOLIN PLOT Y X
SHIFT PLOT Y X
The VIOLIN PLOT is a mix of a a box plot and a kernel density
plot. The shift plot is a variation of quantile-quantile or
Tukey mean-difference plots.
Enter HELP VIOLIN PLOT and HELP SHIFT PLOT for details.
9) The Hotelling control chart capability was upgraded in the following
way:
a) A distinction is now made between phase I and phase II plots.
The previous implementation was effectively a phase I plot.
b) Support was added for the individual observations case.
Enter
HELP HOTELLING CONTROL CHART
for details.
10) The Ljung-Box test for randomness was added. This test is based
on the autocorrelation plot and is commonly used in the context
of ARIMA modeling. Enter
HELP LJUNG BOX TEST
for details.
11) The follwing miscellaneous changes were made:
a) A correction was made in the computation of the Herrell-Davis
quantile estimate. Enter HELP QUANTILE for details.
b) The SEARCH command now returns the line number that the
first match is found on in the internal parameter
LINENUMB. This can occassionaly be useful when writing
macros.
c) If no variable name is given on the READ command, Dataplot
will now try to automatically determine the variables.
There are two cases:
i) If the command SKIP AUTOMATIC was previously entered,
Dataplot will skip all lines until a line starting
with "----" is encountered. It will then backup one
line and read the variable list from that line.
This case is primarily used when reading data files
that come with the Dataplot distribution (i.e., the
files in the Dataplot "DATA" sub-directory). Most,
though not all, of these files follow that convention.
ii) If a SKIP AUTOMATIC command has not been entered,
Dataplot will read the first line of the file and
determine the number of columns of data. It will then
automatically name the variables X1 X2 ... XK (where
K is the number of variables).
Note that any SKIP, COLUMN LIMITS, or ROW LIMITS
commands will be honored when reading the first
line to determine the number of variables.
This capability only applies when reading variables (i.e.,
it is not supported for the READ PARAMETER, READ STRING,
or READ MATRIX cases). Also, it only applies when reading
from a file, not when reading from the terminal.
d) Some bugs were fixed.
12) Added support for the following statistics:
LET A = DIFFERENCE OF MEANS Y1 Y2
LET A = DIFFERENCE OF MIDMEANS Y1 Y2
LET A = DIFFERENCE OF MEIDANS Y1 Y2
LET A = DIFFERENCE OF MIDRANGE Y1 Y2
LET A = DIFFERENCE OF TRIMMED MEANS Y1 Y2
LET A = DIFFERENCE OF WINSORIZED MEANS Y1 Y2
LET A = DIFFERENCE OF GEOMETRIC MEANS Y1 Y2
LET A = DIFFERENCE OF HARMONIC MEANS Y1 Y2
LET A = DIFFERENCE OF HODGES-LEHMAN Y1 Y2
LET A = DIFFERENCE OF BIWEIGHT LOCATIONS Y1 Y2
LET A = DIFFERENCE OF STANDARD DEVIATIONS Y1 Y2
LET A = DIFFERENCE OF VARIANCES Y1 Y2
LET A = DIFFERENCE OF AAD Y1 Y2
LET A = DIFFERENCE OF MAD Y1 Y2
LET A = DIFFERENCE OF INTERQUARTILE RANGE Y1 Y2
LET A = DIFFERENCE OF WINSORIZED SD Y1 Y2
LET A = DIFFERENCE OF WINSORIZED VARIANCE Y1 Y2
LET A = DIFFERENCE OF BIWEIGHT MIDVARIANCE Y1 Y2
LET A = DIFFERENCE OF BIWEIGHT SCALE Y1 Y2
LET A = DIFFERENCE OF PERCENTAGE BEND MIDVARIANCE Y1 Y2
LET A = DIFFERENCE OF GEOMETRIC SD Y1 Y2
LET A = DIFFERENCE OF RANGE Y1 Y2
LET A = DIFFERENCE OF SKEWNESS Y1 Y2
LET A = DIFFERENCE OF KURTOSIS Y1 Y2
LET A = DIFFERENCE OF RELATIVE SD Y1 Y2
LET A = DIFFERENCE OF COEFFICIENT OF VARIATION Y1 Y2
LET A = DIFFERENCE OF SD OF MEAN Y1 Y2
LET A = DIFFERENCE OF RELATIVE VARIANCE Y1 Y2
LET A = DIFFERENCE OF VARIANCE OF MEAN Y1 Y2
LET A = DIFFERENCE OF QUANTILE Y1 Y2
LET A = DIFFERENCE OF MINIMUM Y1 Y2
LET A = DIFFERENCE OF MAXIMUM Y1 Y2
LET A = DIFFERENCE OF EXTREME Y1 Y2
LET A = DIFFERENCE OF MAXIMUM Y1 Y2
LET A = DIFFERENCE OF MAXIMUM Y1 Y2
LET A = DIFFERENCE OF SUM Y1 Y2
LET A = DIFFERENCE OF COUNTS Y1 Y2
Enter HELP for the given statistic for details (e.g.,
HELP DIFFERENCE OF MEANS).
In addition, these statistics are supported for the following
plots and commands
STATISTIC PLOT Y1 Y2 X
CROSS TABULATE STATISTIC PLOT Y1 Y2 X1 X2
BOOTSTRAP PLOT Y1 Y2 X1 X2
JACKNIFE PLOT Y1 Y2 X1 X2
TABULATE Y1 Y2 X
CROSS TABULATE Y1 Y2 X1 X2
LET Z = CROSS TABULATE Y1 Y2 X1 X2
The DIFFERENCE OF COUNTS statistic is not supported for these
plots and commands (since it will simply be zero for all
cases).
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT August-December 2002.
----------------------------------------------------------------------
1) Added the following command:
AUTO TEXT <ON/OFF>
Entering AUTO TEXT ON will prepend a TEXT to all subsequent
lines until an AUTO TEXT OFF command is encoutered. This
command is used in generating word slides. Enter
HELP AUTO TEXT
for details.
2) The list of supported statistics has been expanded for the
following commands:
BLOCK PLOT
DEX PLOT
TABULATE
CROSS TABULATE
MATRIX ROW STATISTIC
MATRIX COLUMN STATISTIC
CROSS TABULATE (LET)
Enter the corresponding HELP command for a complete list
of supported statistics.
3) The CAPTURE command added the following option:
CAPTURE HTML <file-name>
This writes the output from the CAPTURE command in HTML
format. Note that most commands simply use a
<PRE> ... </PRE> syntax. Curently, the exceptions are the
TABULATE and CROSS TABULATE, which write the output using
HTML table syntax.
This can be used in conjunction with the WEB command. For
example,
SKIP 25
READ RIPKEN.DAT Y X1 X2
ECHO ON
CAPTURE HTML C:\TABLE.HTM
TABULATE MEAN Y X1
CROSS TABULATE MEAN Y X1 X2
END OF CAPTURE
WEB file://C:\TABLE.HTM
In addition, if DEVICE 2 is set to PNG, JPEG, or SVG, Dataplot
will incorporate the graphics into the web page using the
IMG tag. For example,
device 1 x11
.
skip 25
read berger1.dat y x
.
line blank solid
character x blank
echo on
capture html fit.htm
set ipl1na data.png
device 2 gd png
title original data
plot y x
device 2 close
fit y x
set ipl1na pred.png
device 2 gd png
title predicted line
plot y pred vs x
device 2 close
end of capture
.
web file:///home/heckert/dataplot/solaris/fit.htm
4) The maximum number of lines in a loop was raised from 500 to
1,000.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT April-July 2002.
----------------------------------------------------------------------
1) Added support for the following probability distribution
functions.
a) Two-Sided Power
TSPCDF(X,THETA,N)
TSPPDF(X,THETA,N)
TSPPPF(X,THETA,N)
LET THETA = <value>
LET N = <value>
LET Y = TWO-SIDED POWER RANDOM NUMBERS FOR I = 1 1 100
LET THETA = <value>
LET N = <value>
TWO-SIDED POWER PROBABILITY PLOT Y
TWO-SIDED POWER PPCC PLOT Y
LET THETA = <value>
LET N = <value>
CHI-SQUARE TWO-SIDED POWER GOODNESS OF FIT TEST Y
LET THETA = <value>
LET N = <value>
KOLMOGOROV-SMIRNOV TWO-SIDED POWER GOODNESS OF FIT TEST Y
LET A = <lower limit>
LET B = <upper limit>
TWO-SIDED POWER MAXIMUM LIKELIHOOD Y
Note: The MLE estimator assumes that the value of the lower
and upper limits (default to 0 and 1) are known and fixed.
It returns estimates for THETA and N.
b) Bi-Weibull
BWECDF(X,SCALE1,GAMMA1,LOC2,SCALE2,GAMMA2)
BWEPDF(X,SCALE1,GAMMA1,LOC2,SCALE2,GAMMA2)
BWEPPF(P,SCALE1,GAMMA1,LOC2,SCALE2,GAMMA2)
BWEHAZ(X,SCALE1,GAMMA1,LOC2,SCALE2,GAMMA2)
BWECHAZ(X,SCALE1,GAMMA1,LOC2,SCALE2,GAMMA2)
LET SCALE1 = <value>
LET GAMMA1 = <value>
LET LOC2 = <value>
LET SCALE2 = <value>
LET GAMMA2 = <value>
LET Y = BIWEIBULL RANDOM NUMBERS FOR I = 1 1 100
LET SCALE1 = <value>
LET GAMMA1 = <value>
LET LOC2 = <value>
LET SCALE2 = <value>
LET GAMMA2 = <value>
BIWEIBULL PROBABILITY PLOT Y
LET SCALE1 = <value>
LET GAMMA1 = <value>
LET LOC2 = <value>
LET SCALE2 = <value>
LET GAMMA2 = <value>
CHI-SQUARE BIWEIBULL GOODNESS OF FIT TEST Y
LET SCALE1 = <value>
LET GAMMA1 = <value>
LET LOC2 = <value>
LET SCALE2 = <value>
LET GAMMA2 = <value>
KOMOGOROV-SMIRNOV BIWEIBULL GOODNESS OF FIT TEST Y
c) Multivariate normal distribution
LET MU = DATA <list of p means>
READ MATRIX SIGMA
<pxp set of values>
END OF DATA
LET N = <value>
LET M = MULTIVARIATE NORMAL RANDOM NUMBERS MU SIGMA N
Note that M will be an NxP matrix. N is the number of rows
generated for each component and their are P components to
the multivariate normal. SIGMA is the pxp variance-covariance
matrix of the multivariate normal. SIGMA will be checked to
ensure that it is a positive definite matrix. MU is a vector
specifying the means of the p components.
This command utilizes a code written by Charlie Reeves when
he was a member of the NIST Statistical Engineering Division.
d) Multinomial distribution
LET P = DATA <list of probabilities that sum to 1>
LET NEVENTS = <value>
LET NCAT = SIZE P
LET N = <value>
LET M = MULTINOMIAL RANDOM NUMBERS P NEVENTS NCAT N
Note that M will be an NxP matrix. N is the number of rows
generated for each component and their are P components to
the multivariate normal. SIGMA is the pxp variance-covariance
matrix of the multivariate normal. SIGMA will be checked to
ensure that it is a positive definite matrix. MU is a vector
specifying the means of the p components.
e) Logarithmic series distribution
Added randon number generation for this distribution. For
example,
LET THETA = 0.7
LET Y = LOGARITHMIC SERIES RANDOM NUMBERS FOR I = 1 1 500
The cdf, pdf, and ppf functions are already available for
this distribution.
2) Made the following updates to the FIT command:
a) Added the command:
SET FIT ADDITIVE CONSTANT <ON/OFF>
If OFF, then Dataplot does not include a constant term
in a multi-linear fit (i.e., FIT Y X1 X2 ...). The
default is to include the additive constant.
b) If Dataplot detects a singularity in a multi-linear fit,
it now prints an error message. Previously, it simply
set all the parameter estimates to 0 and terminated the
fit. In addition, Dataplot explictly checks for two
types of singularities: a column that contains all the same
values (this essentially adds an addtional constant term) and
for two columns being equal.
c) Added the command:
LET M = CREATE MATRIX X1 ... XK
where X1 ... XK designates a list of previously defined
variables.
This command has a similar function as the MATRIX DEFINITION
command. However, the MATRIX DEFINITION command
creates matrices from variables that are contiguous
(the order of variables is determined by the order
in which they were created in Dataplot). The
CREATE MATRIX command does not have this restriction.
The variables need not be contiguous.
This command is useful for creating a design matrix
in regression problems that can be used as input for
some of the new commands that follow.
d) Added the command:
LET C = CATCHER MATRIX X
This computes the catcher matrix, X*(X'X)**(-1). This
matrix is used in the computation of certain regression
diagnostics (e.g., Variance Inflation Factors, Partial
Regression Plots). This command greatly simplifies the
writing of macros to generate these regression diagnostics
(and allows larger design matrices to be used). Enter
HELP CATCHER MATRIX for details.
e) Added the command:
LET XTXINV = XTXINV MATRIX X
This computes the matrix (X'X)**(-1). This
matrix is used in the computation of certain regression
diagnostics (e.g., DFBETA statistic) and in computing
certain confidence and prediction intervals for multi-linear
fits. This command simplifies the writing of macros to
generate these regression diagnostics and intervals
(and allows larger design matrices to be used). Enter
HELP XTXINV MATRIX for details.
f) Added the command:
LET C = CONDITION INDICES X
where X is the design matrix for a multi-linear fit
(note that you need to create the indpendent variables,
including a column containing all 1's, as a matrix).
The condition indices provide a measure of colinearity
in the design matrix. Enter HELP CONDITION INDICES for
details.
g) Added the command:
LET VIF = VARIANCE INFLATION FACTORS X
where X is the design matrix for a multi-linear fit
(note that you need to create the indpendent variables,
including a column containing all 1's, as a matrix).
The variance inflation factors provide a measure of
colinearity in the design matrix. Enter
HELP VARIANCE INFLATION FACTORS for details.
h) Added the following plot commands:
PARTIAL REGRESSION PLOT Y X1 ... XK XI
PARTIAL RESIDUAL PLOT Y X1 ... XK XI
PARTIAL LEVERAGE PLOT Y X1 ... XK XI
CCPR PLOT Y X1 ... XK XI
MATRIX PARTIAL REGRESSION PLOT Y X1 ... XK
MATRIX PARTIAL RESIDUAL PLOT Y X1 ... XK
MATRIX PARTIAL LEVERAGE PLOT Y X1 ... XK
MATRIX CCPR PLOT Y X1 ... XK XI
These generate partial regression plots, partial residual
plots, partial leverage plots, and component and
component-plus-residual (CCPR) plots for a multi-linear fit.
These plots are typically used to assess the effect of
a variable on the fit given the effect of other variables
already included in the fit.
There are 2 forms for the command.
In the first form, a single plot is generated. In this case,
the last variable listed is the "primary" variable. That is,
this is the variable we are considering adding/deleting from
the fit. Note that this variable should already be listed.
That is, a fit of Y versus X1 to XK is performed (including XI),
then the plot assesses the effect of XI on the fit.
In the second form, a multiplot is generated where each
of the indpendent variables is used as the primary variable.
Enter
HELP PARTIAL REGRESSION PLOT
HELP PARTIAL RESIDUAL PLOT
HELP PARTIAL LEVERAGE PLOT
HELP CCPR PLOT
for details.
i) For multi-linear fits, the output for DPST2F.DAT was
enhanced to include Bonferroni and Hotelling joint
confidence limits, respectively, for the predicted values.
By default, a 95% interval is generated. To use a different
alpha value, enter the following command before the fit:
LET ALPHA = 0.90
In addition, the output for DPST1F.DAT now includes
the t critical value and lower and upper joint Bonferroni
confidence limits for the parameters. The format 5E15.7
is used in writing these values.
In addition, for multi-linear fits, the regression ANOVA
table is written to the file DPST5F.DAT. In addition, the
values for R**2, adjusted R**2, and the Press P statistic are
also printed to this file. Theses three statistics are
saved as the internal parameters RSQUARE, ADJRSQUA, and PRESSP,
respectively.
j) One weakness in the Dataplot multi-linear fit routine
has been the lack of any "forward selection/backward
selection/best subsets" capabilities.
The command
BEST CP Y X1 ... XK
was added to identify the best candidate models using
the Mallow's CP criterion. Enter HELP BEST CP for details.
k) Added the command:
BOOTSTRAP FIT Y X1 .... XK
This performs a bootstrap linear/multilinear fit. Bootstrap
linear fits are an alternative to weighting and transformation
when the assumptions for multilinear fitting are not
satisfied (that is, the errors from the fit are independent and
have a common distribution, typically assumed to be normal, with
common location and scale). Enter HELP BOOTSTRAP FIT for
details.
3) Added support for alternative random number generators. Note
that the default generator (i.e., the one that has been in
Dataplot for many years) is based on Fibonacci sequence as
defined by Marsagalia. Note that this is equivalent to the
generator UNI of Jim Blue, David Kahaner, and George Marsagalia
that is in the CMLIB library.
Support is now provided for a linear congruential generator
written by Fullerton (CMLIB routine RUNIF) and a multiplicative
congruential generator (ACM algorithm 599). In addition,
2 generators based on the generalized feedback shift
register (GFSR) methods are supported. The first is based on the
original algorithm of Lewis and Payne (Journal of the ACM,
Volume 20, pp. 456-468). The second is an alternative
implementation given by Fushimi and Tezuka (Journal of the
ACM, Volume 26, pp. 516-523). Both are based on codes
given by Monohan (2000) in "Numerical Methods of Statistics".
Support is also provided for the Applied Statistics algorithm
183. AS183 is based on the fractional part of the sum of 3
multiplicative congruential generators. It requires 3 integers
be specified initially. Dataplot uses the multiplicative
congruenetial generator (which does depend on the SEED command)
to randomly generate these 3 integers.
These 6 generators are used to generate uniform random numbers.
Random numbers for other distributions are then derived from
these uniform random numbers.
To specify the uniform random number generator, use the command
SET RANDOM NUMBER GENERATOR FIBONACCI
SET RANDOM NUMBER GENERATOR LINEAR CONGRUENTIAL
SET RANDOM NUMBER GENERATOR MULTIPLICATIVE CONGRUENTIAL
SET RANDOM NUMBER GENERATOR GFSR
SET RANDOM NUMBER GENERATOR FUSHIMI
SET RANDOM NUMBER GENERATOR AS183
Note that you can use the SEED command to change the random numbers
generated as well. The SEED does not apply to the 2 GFSR
generators (these each have their own initialization routines).
4) Added support for the following special functions.
a) Fermi-Dirac function
FERMDIRA(X,ORDER)
where ORDER is the order of the function. ORDER can be
-0.5, 0.5, 1.5, or 2.5 (Dataplot uses an epsilon of 0.1,
any order not within epsilon of one of the above values
results in an error. Enter HELP FERMDIRA for details.
5) Added support for the following statistics:
LET A = WINSORIZED VARIANCE Y
LET A = WINSORIZED SD Y
LET A = WINSORIZED COVARIANCE Y X
LET A = WINSORIZED CORRELATION Y X
LET A = BIWEIGHT MIDVARIANCE Y X
LET A = BIWEIGHT MIDCOVARIANCE Y X
LET A = BIWEIGHT MIDCORRELATION Y X
LET A = PERCENTAGE BEND MIDVARIANCE Y
LET A = PERCENTAGE BEND CORRELATION Y1 Y2
LET A = HODGES LEHMAN Y
LET A = TRIMMED MEAN STANDARD ERROR
LET A = <XQ> QUANTILE Y
LET A = <XQ> QUANTILE STANDARD ERROR Y
Enter
HELP WINSORIZED VARIANCE
HELP WINSORIZED SD
HELP WINSORIZED COVARIANCE
HELP WINSORIZED CORRELATION
HELP BIWEIGHT MIDVARIANCE
HELP BIWEIGHT MIDCOVARIANCE
HELP BIWEIGHT MIDCORRELATION
HELP PERCENTAGE BEND MIDVARIANCE
HELP PERCENTAGE BEND CORRELATION
HELP HODGES LEHMAN
HELP TRIMMED MEAN STANDARD ERROR
HELP QUANTILE
HELP QUANTILE STANDARD ERROR
for details.
6) Added the following plot:
<stat> INFLUENCE CURVE Y XSEQ
where <stat> is one of the built-in supported statistics,
Y is a response variable, and XSEQ is a sequence of x values.
The plot is generated by looping through the values in XSEQ.
For a given value of XSEQ, the value of <stat> is computed for
that value of XSEQ along with the values in Y. The vertical
axis of the plot contains the computed statistic while the
horizontal axis contains the value of XSEQ.
This plot is of interest in the field of robust statistics.
For details, enter HELP INFLUENCE CURVE.
7) For the ANOVA command, the residual standard deviations for
various models are written to the file DPST3F.DAT (these are
the same values that appear in the fitted output). This
allows these values to be read back in as a variable, which
is occassionally useful in writing macros that involve an
ANOVA step.
8) The PROBE command now recognizes the following:
PROBE IDMAN(1)
PROBE IDMAN(2)
PROBE IDMAN(3)
This identifies the current manufacturer for devices 1, 2, and
3 respectively. In addition, the value of PROBEVAL is set
if the returned manufacturer is one of the following:
X11 = 1
QWIN = 2
REGI = 3
TEKT = 4
OPGL = 5
QUAR or MACI = 6
POST or PS = 7
HP or HPGL = 8
GENE = 9
GD = 10
QUIC = 11
CALC = 12
ZETA = 13
GKS = 14
LAHE = 15
PRIN = 16
LATE = 17
SVG = 18
DISC = 19
In addition, the device model can be extracted via the commands
PROBE IDMOD(1)
PROBE IDMOD(2)
PROBE IDMOD(3)
PROBE IDMO2(1)
PROBE IDMO2(2)
PROBE IDMO2(3)
PROBE IDMO3(1)
PROBE IDMO3(2)
PROBE IDMO3(3)
The following PROBE commands were added to return the
operating system and compiler, respectively.
PROBE IOPSY1
PROBE ICOMPI
For IOPSY1, the value of PROBEVAL is also set:
UNIX = 1 (Unix)
PC-D = 2 (Windows)
VMS = 3 (VAX/VMS)
other = 0
For ICOMPI, the value of PROBEVAL is also set:
f77 = 1 (the Unix Fortran compiler)
MS-F = 2 (the Microsoft, now Compaq, Fortran compiler)
LAHE = 3 (the Lahey Fortran compiler)
other = 0
In general, if the PROBE command returns a string value of ON,
OPENED, or YES, it sets the value of the PROBEVAL parameter to 1.
Similarly, if the PROBE command returns a string value of OFF,
CLOSED, or NO, it sets the value of the PROBEVAL parameter to 0.
The above uses of PROBE are primarily of value in writing
general purpose macros. In particular, macros that are intended
to be used by others.
9) The following command was added:
CAPTURE FLUSH
The purpose of this command is to allow Dataplot text output
to be written to the graphics output file. This can be useful
when you are writing a macro and you want the analytic output
(for example, the output from a fit) to be included with the
graphics output. The following shows a sample of how this
command is used:
device 1 x11
device 2 postscript
.
title automatic
skip 25
read gear.dat y x
.
mean plot y x
.
move 5 95
margin 5
capture junk.dat
tabulate mean y x
capture flush
end of capture
.
device 2 close
system lpr dppl1f.dat
The initial CAPTURE command directs text output to the
file "junk.dat". When the CAPTURE FLUSH command is
encountered, the capture file is closed, an ERASE command
is generated for the graphics devices, the contents of
the capture file are printed on the graphics devices using
the TEXT command (i.e., each line of the file generates a
distinct TEXT command), and then the capture file is re-opened
(it will start at the beginning).
Since the lines are generated with the TEXT command, the
appearance of the text can be controlled with the various
TEXT attribute commands. Also, it is recommended that
CRLF be set to ON (the default), a MOVE command be given to
set the position for the first line of the text, and a MARGIN
command be entered to set the beginning x-coordinate for the
line.
Some output may be too long to display on one page. You
can control the number of lines printed per page with the
following command:
SET CAPTURE LINES <value1> ... <value5>
Up to 5 values may be entered. The first value is for the
first page of output, the second value is for the second
page of output, and so on. If more than 5 values are
entered, then the page limits start over (i.e., page 6 uses
the value for page 1, page 7 uses the value for page 2, and
so on). The default is 25 lines for all pages.
If the MULTIPLOT switch is ON, the initial page erase is
suppressed. The following example shows how this feature
can be used:
.
device 1 x11
device 2 ps
device 1 font simplex
.
title automatic
skip 25
read gear.dat y x
.
multiplot 2 2
multiplot corner coordinates 0 0 100 100
multiplot scale factor 2
.
mean plot y x
sd plot y x
.
move 5 98
margin 5
plot
capture junk.dat
tabulate mean y x
capture flush
end of capture
move 5 98
plot
capture junk.dat
tabulate sd y x
capture flush
end of capture
.
end of multiplot
.
Note that the null PLOT command is used to move to the
next plot area without actually generating a plot.
This example draws a mean and standard deviation plot
on the first row and then suplements that with the numeric
values generated using the TABULATE command on the second
row.
The following two commands are also available.
SET CAPTURE NUMBER <ON/OFF>
SET CAPTURE BOX <ON/OFF>
If SET CAPTURE NUMBER ON is entered, the output lines are
numbered. This is primarily a convenience function to help
determine what values to enter for the SET CAPTURE LINES command
in order to generate breaks at the appropriate spots.
If SET CAPTURE BOX ON is entered, a box will be drawn for each
page of the output. Use the BOX 1 CORNER COORDINATES command,
before the CAPTURE FLUSH, to specify the cooridinates of the
box. Use the various BOX attribute commands to set the
properties of the box.
10) The following enhancements were made to the IF command:
a) You can now test for strings with the IF command. That is,
LET STRING S = TEST
IF S = TEST
PRINT S
ENDS OF IF
LET STRING S = TEST
IF S <> "NOT TEST"
PRINT S
ENDS OF IF
Note that "=" and "<>" are the only comparisons allowed (i.e.,
no "<" or ">").
The argument on the left of the "=" must be the name of a
previously defined string. The argument to the right of the
"=" is a literal string. The string can be enclosed in
dooble quotes, ", if it contains spaces. If there are no
double quotes, the string is assumed to end once the first
space is encountered.
b) Support was added for a ELSE and ELSE IF clauses. For
example,
IF A = 2
PRINT "A = 2"
ELSE
PRINT "A NOT EQUAL 2"
END OF IF
or
IF A = 2
PRINT "A = 2"
ELSE IF A = 1
PRINT "A = 1"
ELSE
PRINT "A NOT EQUAL 2 AND A NOT EQUAL 1"
END OF IF
c) A bug was fixed for the IF ... NOT EXIST and IF ... EXIST
cases. Also, these now test whether the name exists as a
parameter, string, variable, or matrix (previously, it only
checked if it was a parameter).
11) One problem with reading files in Dataplot has been the
inability to handle directory and file names with embedded
spaces. The command
SET FILE NAME QUOTE <ON/OFF>
was added to address this problem. If ON is specified,
then the file name may be enclosed in double quotes (").
All text, including spaces, until the matching ending double
quote is found are considered a part of the file name (no
provision is made for file names containing a double quote
character). If OFF is specified, this feature is disabled.
The default is OFF to accomodate quoted strings on the WRITE
that might contain a "." (which is what Dataplot uses to
identify a file name). For example,
WRITE "Example of writing a string."
The following will work as intended:
SET FILE NAME QUOTE ON
WRITE "C:\ My Data\STRING.OUT" "String to STRING.OUT"
12) Modified the output for the SIGN TEST, SIGNED RANK TEST, and
the RANK SUM test to have better clarity.
13) Added the following to the BOOTSTRAP PLOT command:
BOOTSTRAP CORRELATION PLOT Y X
BOOTSTRAP RANK COVARIANCE PLOT Y X
BOOTSTRAP RANK CORRELATION PLOT Y X
BOOTSTRAP COVARIANCE PLOT Y X
BOOTSTRAP LINEAR CALIBRATION PLOT Y X
BOOTSTRAP QUADRATIC CALIBRATION PLOT Y X
14) Fixed several bugs.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT November-March 2002.
----------------------------------------------------------------------
1) Added the following probability distributions.
a) Geometric Extreme Exponential
GEECDF(X,GAMMA)
GEEPDF(X,GAMMA)
GEEPPF(X,GAMMA)
GEEHAZ(X,GAMMA)
GEECHAZ(X,GAMMA)
LET GAMMA = <value>
LET Y = GEOMETRIC EXTREME EXPONENTIAL RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
GEOMETRIC EXTREME EXPONENTIAL PROBABILITY PLOT Y
GEOMETRIC EXTREME EXPONENTIAL PPCC PLOT Y
LET GAMMA = <value>
CHI-SQUARE GEOMETRIC EXTREME EXPONENTIAL GOODNESS OF FIT TEST Y
LET GAMMA = <value>
KOLMOGOROV-SMIRNOV GEOMETRIC EXTREME EXPONENTIAL GOODNESS OF FIT TEST Y
b) Johnson SB
JSBCDF(X,ALPHA1,ALPHA2)
JSBPDF(X,ALPHA1,ALPHA2)
JSBPPF(X,ALPHA1,ALPHA2)
LET ALPHA1 = <value>
LET ALPHA2 = <value>
LET Y = JOHNSON SB RANDOM NUMBERS FOR I = 1 1 100
LET ALPHA1 = <value>
LET ALPHA2 = <value>
JOHNSON SB PROBABILITY PLOT Y
JOHNSON SB PPCC PLOT Y
LET ALPHA1 = <value>
LET ALPHA2 = <value>
CHI-SQUARE JOHNSON SB GOODNESS OF FIT TEST Y
LET ALPHA1 = <value>
LET ALPHA2 = <value>
KOLMOGOROV-SMIRNOV JOHNSON SB GOODNESS OF FIT TEST Y
c) Johnson SU
JSUCDF(X,ALPHA1,ALPHA2)
JSUPDF(X,ALPHA1,ALPHA2)
JSUPPF(X,ALPHA1,ALPHA2)
LET ALPHA1 = <value>
LET ALPHA2 = <value>
LET Y = JOHNSON SU RANDOM NUMBERS FOR I = 1 1 100
LET ALPHA1 = <value>
LET ALPHA2 = <value>
JOHNSON SU PROBABILITY PLOT Y
JOHNSON SU PPCC PLOT Y
LET ALPHA1 = <value>
LET ALPHA2 = <value>
CHI-SQUARE JOHNSON SU GOODNESS OF FIT TEST Y
LET ALPHA1 = <value>
LET ALPHA2 = <value>
KOLMOGOROV-SMIRNOV JOHNSON SU GOODNESS OF FIT TEST Y
d) Generalized Tukey-Lambda
Note: still being tested/developed. In particular,
negative values of shape parameter are not working.
GLDCDF(X,LAMBDA3,LAMBDA4)
GLDPDF(X,LAMBDA3,LAMBDA4)
GLDPPF(X,LAMBDA3,LAMBDA4)
LET LAMBDA3 = <value>
LET LAMBDA4 = <value>
LET Y = GENERALIZED TUKEY LAMBDA RANDOM NUMBERS FOR I = 1 1 100
LET LAMBDA3 = <value>
LET LAMBDA4 = <value>
GENERALIZED TUKEY LAMBDA PROBABILITY PLOT Y
GENERALIZED TUKEY LAMBDA PPCC PLOT Y
LET LAMBDA3 = <value>
LET LAMBDA4 = <value>
CHI-SQUARE GENERALIZED TUKEY LAMBDA GOODNESS OF FIT TEST Y
LET LAMBDA3 = <value>
LET LAMBDA4 = <value>
KOLMOGOROV-SMIRNOV GENERALIZED TUKEY LAMBDA GOODNESS OF FIT TEST Y
2) Added support for the following new statistics.
a) LET A = BIWEIGHT LOCATION Y
b) LET A = BIWEIGHT SCALE Y
For more information, enter the following commands:
HELP BIWEIGHT LOCATION
HELP BIWEIGHT SCALE
3) Added support for a biweight based confidence interval:
BIWEIGHT CONFIDENCE INTERVAL Y
For more information, enter the following command:
HELP BIWEIGHT CONFIDENCE INTERVAL
4) Added the following command:
SET BOX PLOT WIDTH <VARIABLE/FIXED>
This specifies whether box plots are drawn with fixed width
or variable width boxes. In variable width box plots, the
width of the box is proportional to the maximum group sample
size. That is, the largest width is used for the box plot
with the largest sample size. The remaining box plots
compute a scale factor that is the sample size of the given
box plot relative to the maximum sample size.
The default is variable width. This is recommended in most cases
as it conveys additional information regarding the relative
sample sizes. However, there are cases where it is desirable
to turn this feature off (e.g., when multiple BOX PLOT commands
are used to overlay box plots on the same page.
5) Added the following commands:
SET 4PLOT MULTIPLOT <ON/OFF>
SET 6PLOT MULTIPLOT <ON/OFF>
Setting these switches ON specifies that the multiplot corner
coordinates will be used to size the 4-PLOT and 6-PLOT,
respectively. The default is OFF (i.e., the plot sizes are
hard-coded to a default value). If set to ON, then you
can use the MULTIPLOT CORNER COORDINATES to size the
graphs.
6) ROBUSTNESS PLOT was added as a synonym for BLOCK PLOT.
7) Support was added for the Scalable Vector Graphics (SVG)
graphics output. SVG is an XML based vector graphics format
that is expected to become increasingly popular for web based
applications. SVG format files can also be imported into
several popular graphics editing programs. For more information,
enter
HELP SVG
8) The VERSION command was re-activated.
9) Fixed several bugs.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT May-October 2001.
----------------------------------------------------------------------
1) Added support for kernel density plots. Enter
HELP KERNEL DENSITY PLOT
for details.
2) Added the following command:
CONSENSUS MEAN PLOT
This plot summarizes the results of a consensus means analysis.
Enter
HELP CONSENSUS MEANS PLOT
for details.
3) Added the following probability distributions.
a) Inverted Weibull
IWECDF(X,GAMMA)
IWEPDF(X,GAMMA)
IWEPPF(X,GAMMA)
IWEHAZ(X,GAMMA)
IWECHAZ(X,GAMMA)
LET GAMMA = <value>
LET Y = INVERTED WEIBULL RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
INVERTED WEIBULL PROBABILITY PLOT Y
INVERTED WEIBULL PPCC PLOT Y
LET GAMMA = <value>
CHI-SQUARE INVERTED WEIBULL GOODNESS OF FIT TEST Y
LET GAMMA = <value>
KOLMOGOROV-SMIRNOV INVERTED WEIBULL GOODNESS OF FIT TEST Y
b) Log Double Exponential
LDECDF(X,ALPHA)
LDEPDF(X,ALPHA)
LDEPPF(X,ALPHA)
LET ALPHA = <value>
LET Y = LOG DOUBLE EXPONENTIAL RANDOM NUMBERS FOR I = 1 1 100
LET ALPHA = <value>
LOG DOUBLE EXPONENTIAL PROBABILITY PLOT Y
LOG DOUBLE EXPONENTIAL PPCC PLOT Y
LET ALPHA = <value>
CHI-SQUARE lOG DOUBLE EXPONENTIAL GOODNESS OF FIT TEST Y
LET ALPHA = <value>
KOLMOGOROV-SMIRNOV LOG DOUBLE EXPONENTIAL GOODNESS OF FIT TEST Y
4) Added support for random number for the following distributions:
LET Y = COSINE RANDOM NUMBERS FOR I = 1 1 100
LET Y = ANGLIT RANDOM NUMBERS FOR I = 1 1 100
LET Y = HYPERBOLIC SECANT RANDOM NUMBERS FOR I = 1 1 100
LET Y = ARCSIN RANDOM NUMBERS FOR I = 1 1 100
LET Y = HALF-LOGISTIC RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
LET Y = DOUBLE WEIBULL RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
LET Y = DOUBLE GAMMA RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
LET Y = INVERTED GAMMA RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
LET Y = LOG GAMMA RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
LET Y = GENERALIZED EXTREME VALUE RANDOM NUMBERS FOR I = 1 1 100
LET DELTA = <value>
LET Y = LOG LOGISTIC RANDOM NUMBERS FOR I = 1 1 100
LET BETA = <value>
LET Y = BRADFORD RANDOM NUMBERS FOR I = 1 1 100
LET B = <value>
LET Y = RECIPROCAL RANDOM NUMBERS FOR I = 1 1 100
LET C = <value>
LET B = <value>
LET Y = GOMPERTZ RANDOM NUMBERS FOR I = 1 1 100
LET P = <value>
LET Y = POWER NORMAL RANDOM NUMBERS FOR I = 1 1 100
LET P = <value>
LET SD = <value>
LET Y = POWER LOGNORMAL RANDOM NUMBERS FOR I = 1 1 100
LET ALPHA = <value>
LET BETA = <value>
LET Y = POWER EXPONENTIAL RANDOM NUMBERS FOR I = 1 1 100
LET ALPHA = <value>
LET BETA = <value>
LET Y = ALPHA RANDOM NUMBERS FOR I = 1 1 100
LET GAMMA = <value>
LET THETA = <value>
LET Y = EXPONENTIATED WEIBULL RANDOM NUMBERS FOR I = 1 1 100
5) Extended the ppcc plot to handle distributions with 2
shape parameters. Specifically,
BETA PPCC PLOT
GOMPERTZ PPCC PLOT
ALPHA PPCC PLOT
EXPONENTIAL POWER PPCC PLOT
EXPONENTIATED WEIBULL PPCC PLOT
This generates a 3-d plot of ppcc value over the range of
values taken by the 2 shape parameters.
Support for several additional 2-shape parameter distributions
is still being tested.
Enter HELP PPCC PLOT for details.
6) Made some updates to the STANDARDIZE command.
a) LET Y2 = USCORE Y X1 X2
This syntax generates a u-score (i.e., subtract the minimum
and divide by the range). This effectively translates
the variable to a uniform (0,1) scale (much as the z-score
translates to a standard normal scale).
b) LET Y2 = SCALE STANDARDIZE Y X1 X2
This divides by the scale statistic, but does not subtract
the location statistic first.
c) Support was added for additional location and scale
statistics.
Enter HELP STANDARDIZE for details.
7) Added the command
LET Y2 = CROSS TABULATE <stat> Y X1 X2
where <stat> is one of approximately 25 statistics.
This command is related to, but different than, the
analysis command CROSS TABULATE. This command stores
the value of the cross tabulated statistic in
each row of Y2 (where Y2 is the same length as the original
array Y). The purpose of this form of the cross tabulation
is to allow the cross tabulated values to be used in
subsequent computations (e.g., to compute statistics not
supported directly by Dataplot).
For more information, enter the following command:
HELP CROSS TABULATE (LET)
In this case, you need to specify the "(LET)" in order to
avoid ambiguity with other CROSS TABULATE commands.
8) Added support for the following new statistics.
a) LET A = INTERQUARTILE RANGE Y
For more information, enter the following commands:
HELP INTERQUARTILE RANGE
9) Added the following commands:
LET A = COMMON DIGITS Y
LET A = NUMBER OF COMMON DIGITS Y
These commands return the common digits, and the number of
common digits, of a vector of numbers. For example, given
the numbers 3.214, 3.216, 3.217, and 3.219, the common digits
are 3.21 and the number of common digits is 2. The common digits
are tested to the the RIGHT of the decimal point only (although
Dataplot does include the portion to the left of the decimal
point when returning the value of the common digits). If the
numbers do no match in their integer portion, Dataplot does
not return any common digits. This is a convenience command
that was added to simplify some macros we were writing.
10) Added the following command:
LET Y = MATCH X VAL
LET Z2 = MATCH X VAL Z
This command matches each value in VAL against X. For the
first syntax, it returns the index of the X array where the
match was found. A match is that value that is closest in
absolute value (i.e., an exact match is not required, so
a match will always be returned). For the second syntax,
the index is used to extract the value in Z corresponding to
the matched index. This second syntax in fact implements the
most common use of this command (i.e., the index is usually
not of interest in itself, rather it is used to extract
appropriate values from another variable).
11) A few bug fixes were made. In partiuclar,
a) The ANDERSON DARLING WEIBULL TEST was modified slightly.
You no longer get an error message if the GAMMA parameter
is not specified. This GAMMA was not actually being used.
The command now does the following:
i) If no GAMMA (shape parameter) or BETA (scale parameter)
has been predefined, maximum likelihood estimates are
computed automatically.
ii) If GAMMA and BETA are pre-defined, then the test is
based on these values. This allows you to test the goodness
of fit for parameter values obtained by methods other than
maximum likelihood.
b) Made a few fixes in the SINGLE SAMPLE ACCEPTANCE PLAN
command. Specifically, it now requires P1 < P2. In addition,
a maximum number of iterations has been added to detect
convergence problems (although this usually caused by P1 > P2).
Also modified the documentation for this command to provide
more realistic examples.
c) Fixed some bugs in the GD device driver (JPEG and PNG
support).
d) The COLUMN LIMITS command now works with READ STRING
(when the string is read from a file).
e) The output for a number of confirmatory tests was modified
for clarity. Note that the underlying computations were
not modified, just the presentation of the output.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT February-April 2001.
----------------------------------------------------------------------
1) The online help files have been substantially updated.
Specifically, the additions over the last three years are
now (mostly) incorporated into the help files and the
web documentation.
2) Added support for generating JPEG and PNG image formats.
Enter HELP GD for details. These device drivers are dependent
on several external libraries, so support may not be
available on all platforms.
3) Added the following command:
CHARACTER AUTOMATIC SIGN <varname>
This is similar to the CHARACTER AUTOMATIC command. However,
it makes the character "+", "-", or "0" depending on the
sign of the value in <varname>. This is sometimes useful
when writing macros for design of experiment applications.
4) PROBE is used to determine the current value of internal
Dataplot variables. Added the following values that can
now be accessed with PROBE.
FX1MIN
FX1MAX
FY1MIN
FY1MAX
GX1MIN
GX1MAX
GY1MIN
GY1MAX
DX1MIN
DX1MAX
DY1MIN
DY1MAX
The FX1MIN, FX1MAX, FY1MIN, FY1MAX define the current
axis limits, DX1MIN, DX1MAX, DY1MIN, DY1MAX define the
current data limits, and GX1MIN, GX1MAX, GY1MIN, GY1MAX
are the current "fixed" limits (i.e., limits set by the
LIMITS command).
The most common use is to PROBE the values for FX1MIN,
FX1MAX, FY1MIN, and FY1MAX to determine the current
axis limits. This can sometimes be useful when writing
complex macros. For example,
PLOT SIN(X) FOR X = 0 0.1 6
PROBE FX1MIN
LET XAXISMIN = PROBEVAL
PROBE FX1MAX
LET XAXISMAX = PROBEVAL
PROBE FY1MIN
LET YAXISMIN = PROBEVAL
PROBE FY1MAX
LET YAXISMAX = PROBEVAL
5) Added the following command:
LET Y2 = STANDARDIZE Y
LET Y2 = STANDARDIZE Y X1
LET Y2 = STANDARDIZE Y X1 X2
This command standardizes a variable, Y, based on either
no groups, one group, or two groups. You can standardize
for both mean and standard deviation or just by the mean.
By standardize, we mean subtract the mean and divide by the
standard deviation. Alternative measures for location and
scale are allowed. For details, enter
HELP STANDARDIZE
6) By default, the size of characters in subscripts or superscripts
are set to 1/2 the current character size. You can set the
scale factor using the following commands:
SET SUPERSCRIPT VERTICAL SCALE <value>
SET SUPERSCRIPT HORIZONTAL SCALE <value>
These set the height and width of the character respectively.
7) The CAPABILITY command was significantly enhanced. Enter
HELP CAPABILITY
for details.
8) Support was added for orthogonal distance regression. Enter
HELP ORTHOGONAL DISTANCE FIT
for details.
9) Support was added for consensus means using Mandel-Paule,
modified Mandel-Paule, Vangel-Ruhkin (maximum likelihood),
Schiller-Eberhardt, and bounds on bias (BOB) methods. Enter
HELP CONSENSUS MEANS
for details.
10) Some bugs were fixed.
In particular, diagrammatic graphics drawn in data units rather
than screen units (e.g., DRAWDATA, MOVEDATA) were not drawn
correctly for log scales. This has been fixed. An error
message is printed if a WEIBULL or NORMAL axis scale is detected.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January 2000.
----------------------------------------------------------------------
1) Added the following commands.
a) LEGEND <numb> UNITS <DATA/SCREEN>
This command allows legend coordinates to be interpreted
in either the screen 0 to 100 units (SCREEN, the default) or in
units of the plot (DATA).
b) ...LABEL OFFSET <value>
...LABEL JUSTIFICATION <value>
These commands allow you to set the horizontal offset
(in Dataplot 0 to 100 screen units, the LABEL DISPLACEMENT
allows you to set the vertical offset) and justification of
the axis labels. These commands were motivated by some
of the new multiplots discussed below. However, they
can be used at any time (although usage should be rare).
c) You can use CR() in text strings to start a new line.
Up to 10 lines may be entered, although more than 3 lines
is rare. Each of the lines use the same plot attributes
(e.g., all left justified or all center justified).
This applies to both hardware and software fonts and
is used for all types of text. The most common usages
are to create multiline titles and legends and to use
multiple lines with alphabetic tic mark labels.
d) By default, the Dataplot HISTOGRAM and FREQUENCY POLYGONS
range from -6 to +6 standard deviations from the mean.
Although in most cases, this is more than adequate,
Dataplot did not warn you if points were found outside
this range. Dataplot now flags the number of points
outside this range (separate messages for points below
and points above). No message is printed if all points
are within the range. The CLASS LOWER and CLASS UPPER
commands can be used if you need to widen the range.
e) Dataplot now supports row labels and variable labels.
Row labels are strings of up to 32 characters that
are used to identify a row of the data. To define
the row label, do something like the following:
SKIP 25
COLUMN LIMITS 1 19
READ ROW LABELS AUTO79.DAT
COLUMN LIMITS 20 132
READ AUTO79.DAT Y1 TO Y12
The COLUMN LIMITS are almost always used when reading the
row labels. Typically, you read a file once for the
numeric data and then a second time for the row labels.
Currently, the use of row labels is only supported
with the CHARACTER command (see below). However, we
anticipate additional usage of this feature in future
updates.
A long label (up to 52 characters) can be associated with a
variable name (which is currently limited to 8 characters).
Variable labels are specified with (note that the variable
name must already be defined).
VARIABLE LABEL <var name> <var label>
The label may contain spaces. Variable labels are currently
supported in three ways:
i) Some of the new multi-plotting commands (discussed
below) automatically make use of variable labels.
ii) You can use the "^" to substitute a variable label
for a variable name in text strings. For example,
LET Y = NORMAL RAND NUMBERS FOR I = 1 1 100
VARIABLE LABEL Y NORMAL RANDOM NUMBERS
Y1LABEL ^Y
PLOT Y
Previously, Dataplot only supported substitutions
for parameters and strings. Now, if a variable name
is found, it checks to see if a label has been defined.
If yes, the label is substituted for the variable name.
If not, the variable label is left as is (with the
"^" removed).
iii) The X1LABEL AUTOMATIC and Y1LABEL AUTOMATIC commands
will now substitute the varialbe label for the variable
name on the x and y axes respectively.
f) The following special options were added for the
CHARACTER command:
ROWID - uses the row number as the plot character
ROWLABEL - uses the row label as the plot character
XVALUE - uses the x-coordinate of the point as the
plot character
YVALUE - uses the y-coordinate of the point as the
plot character
XYVALUE - uses (x-coor,y-coor) as the plot character
TVALUE - uses the tag value as the plot character
(Dataplot assigns a curve-id, the tag,
to each point)
ZVALUE - this is a special form that is specific to
certain commands. For a few commands (currently
the DEX CONTOUR PLOT and the CROSS TABULATE
PLOT, but we expect a few
additional plots to support this form in future
releases), Dataplot writes a numeric value into
an internal array. The value in this array is
used as the plot symbol. Using this with
unsupported plot types may have unpredictable
results (it will depend on what is stored in
the internal array). This option is typically
set automatically by Dataplot in the
background, so currently users should not
set this directly.
The ROWID and ROWLABEL are typically only used for the
PLOT command (i.e., not for HISTOGRAM, etc.). This option
keeps track of any subsetting (i.e., SUBSET/FOR/EXCEPT
clauses on the plot command) when identifying the point.
However, the results may be unpredictable for graphics other
than the PLOT command.
The most common use of this command is to identify specific
points on the plot (typically with the ROWLABEL option).
A typical sequence would be
CHARACTER X
PLOT Y X
PRE-ERASE OFF
LIMITS FREEZE
CHARACTER ROWLABEL
PLOT Y X SUBSET Y > 90
g) The STATISTIC PLOT command now supports the
CORRELATION, RANK CORRELATION, COVARIANCE, and RANK
COVARIANCE cases.
h) The command
SET PARAMETER EXPANSION <NUMERIC/EXPONENTIAL>
was added. This command applies when substituting the
value of a parameter using "^". Normally, this was
intended for putting numeric values in text lagels. In this
case, it is desirable to limit the number of digits. However,
when used with the FIT command (parameters you want to remain
constant rather than be fitted are often entered this way),
you may need to specify high precision. If NUMERIC (the
default) is specified, the current algorithm for parameter
substitution is used. If EXPONENTIAL is specified, the
parameter is entered using scientific notation. For example,
(0.123456789012*10**(2))
i) The command
SET SORT DIRECTION <ASCENDING/DESCENDING>
was added. This command specifies whether the sorts
performed by SORT and SORTC are ascending or descending
sorts (the default is ascending).
2) The following new plots were added.
a) INTERACTION PLOT Y X1 ... XK
<stat> INTERACTION PLOT Y X1 ... XK
These plot Y versus X1*X2* ... *XK and are primarily intended
for DEX applications. Specifically, it serves as the
building block for the DEX INTERACTION PLOT discussed below.
It is actually the DEX INTERACTION PLOT that is typically
generated by the user. This command supports the same
set of statistics as the STATISTIC PLOT command.
The case of most interest for the DEX plots is 2 X variables,
but these plots will in fact handle an arbitrary number
up to 25.
b) CROSS TABULATE <stat> PLOT Y X1 X2
CROSS TABULATE <stat> PLOT Y1 Y2 X1 X2
CROSS TABULATE PLOT X1 X2
CROSS TABULATE PLOT <stat> X1 X2
This command performs a cross-tabulation on X1 and X2.
It computes the statistic given by <stat> for the response
values (Y) in each cell of the cross tabulation. The list
of supported statistics is the same as for the
STATISTIC PLOT command. Most of the supported statistics
expect a single response variable. A few expect two
(e.g., LINEAR CORRELATION). The COUNT (or NUMBER) expect
no response variables.
The output of this command plots the computed statistic
on the Y axis. The X axis coordinate is determined from
the two group variables in the following way:
i) The levels of the first group variable (X1 in the above
examples) are plotted at 1, 2, 3, etc.
ii) For each level of the group 1 variable, the levels of
the group 2 variable are scaled +/- 0.2 around the
level of the group 1 variable.
For example, if X1 has 2 levels (at 1 and 2) and X2 has
3 levels (1, 2, and 3), then the following x-coordinates
are used:
X1 X2 X-COOR
============================
1 1 0.8
1 2 1.0
1 3 1.2
2 1 1.8
2 2 2.0
2 3 2.2
The syntax CROSS TABULATE X1 X2 is a special case. It plots
the value of X1 on the X axis and the value of X2 on the
Y axis. The plot character is then set to the count
for that cell (this is done automatically and you do not need
to set the plot character). This form of the plot has
application in the design of experiments.
Note that this command is an extension of the STATISTIC PLOT
command. However, instead of one group variable, there
are two group variables.
The command
SET CROSS TABULATE PLOT DIMENSION <1/2>
can be used to specify an alternative format for this
plot. If "1", then the format of the plot is described
as above. If "2", then the format is similar to the
CROSS TABULATE X1 X2 format. That is,
SET CROSS TABULATE PLOT DIMENSION 2
CROSS TABULATE MEAN PLOT Y X1 X2
will print the value of the mean of Y at the value of X1 on
the X axis and the value of X2 on the Y axis. Essentially,
this is the tabled values in graphic format. You can
use this format to generate plots where you want to print
a numeric value at (X,Y), that is some value other than
X or Y. You can define a response variable Z with the
desired values to print and then use the CROSS TABULATE
MEAN PLOT (if there is only one value, the mean is equal
to that value).
c) DEX CONTOUR PLOT Y X1 X2 YCONT
This plots a dex contour plot for the case when X1 and X2
have 2 levels (represented by the values -1 and 1). In
addition, one or more center points (X1 and X2 both 0)
may be present. Any points where X1 and X2 are not equal
to -1, 1, or 0 are ignored. The array YCONT contains the
contour levels.
The appearance of the plot is controlled by the settings
of the LINE and CHARACTER command. Specifically,
trace 1 = label for center point and the points
at (-1,-1), (-1,1), (1,1), (1,-1). The
character setting should be ZVAL and line
should be blank.
trace 2 = center point. If no center point was specified,
this point is not generated (and the CHAR and LINE
settings need to be adjusted accordingly).
trace 3 = line connecting (-1,-1), (1,-1), (1,1), (-1,1)
trace 4+= the contour lines start with trace 4. There is
one trace for each value of YCONT.
This command implements the algorithm previously available
in the built-in DEXCONT.DP macro as a Dataplot command.
As an example of this command, you can enter
SKIP 25
READ BOXYIELD.DAT Y X1 X2
LET YCONT = SEQUENCE 50 2 70
CHARACTER ZVAL CIRCLE CIRCLE
CHARACTER FILL OFF ON ON
LINE BLANK BLANK BLANK
DEX CONTOUR PLOT Y X1 X2
d) YATES CUBE PLOT Y X1 X2 X3
This plots a Yates cube plot for the case when X1, X2, and
X3 are factor variables with exactly two levels. It plots
the value of the response variable, Y, at each vertex.
This plot is used in 2**(3) factorial and fractional
factorial designs.
3) Dataplot now supports sub-regions on plots. Sub-regions are
motivated by the desire to denote "engineering limits"
on a plot. That is, a rectangle, denoting an acceptance
region in both the X and Y directions, is drawn on the
plot and then the plots are overlaid on top of this.
Although the subregion capability was motivated for the
purpose of denoting engineering limits, they can in fact
be used for whatever purpose you want.
The SUBREGION commands are:
SUBREGION <ON/OFF> <ON/OFF> <ON/OFF> ....
SUBREGION XLIMITS <lower value> <upper value>
SUBREGION <id> XLIMITS <lower value> <upper value>
SUBREGION YLIMITS <lower value> <upper value>
SUBREGION <id> YLIMITS <lower value> <upper value>
Up to 10 subregions may be defined. In most applications,
only a single subregion is plotted. The SUBREGION <ON/OFF>
switch determines whether or not the given subregion is
plotted. The SUBREGION XLIMITS/YLIMITS commands specify
the lower and upper bounds of the rectangle. If no
<id> is specified, the limits are set for the first subregion.
If <id> is specified, it should be between 1 and 10.
You do not need to adjust the settings for the CHARACTER, LINE,
BAR, and SPIKE when using subregions. Dataplot automatically
shifts these in the background. The attributes of the SUBREGION
are defined by:
REGION FILL <ON/OFF>
REGION COLOR <COLOR>
REGION BORDER LINE <linetype>
REGION BORDER COLOR <color>
The REGION FILL and REGION COLOR determine the attributes of
the interior of the rectangle. The two most common choices
are to leave it blank or to fill it with some type of light gray
scale color. The attributes of the box border are set with
the REGION BORDER LINE and REGION BORDER COLOR commands. The
standard line types (BLANK,SOLID, DASH, DOTTED, etc.) are
supported. Although only one setting was given above, if you
have defined multiple subregions, then you should define
multiple settings in the above command.
A typical sequence of commands would be
SUBREGION ON
SUBREGION XLIMITS 0.35 0.42
SUBREGION YLIMITS 2000 3000
REGION FILL ON
REGION BORDER LINE DASH
REGION COLOR G90
PLOT ....
SUBREGION OFF
Some points to note about subregions are:
a) The subregions are plotted before any of the plot
curves. The significance of this is that a solid filled
subregion will be drawn and then the regular plot points
are drawn on top. The effect of this can be hardware
dependent. On X11 and Postscript devices, a solid character
can be seen on top of a light gray scale box (if the gray
scale gets too dark, the plot points are no longer
distinguishable). However, on some hardware devices, you may
not be able to see points plotted on top of a solid fill
region. In this case, plot the border of the subregion and
leave the interior blank.
It is this order of plotting that distinguishes the
subregion from simply using a BOX <id> command to plot
rectangular regions on the screen.
b) Although most commonly used with the PLOT command, subregions
can in fact be used with any Dataplot graphics command.
c) Currently, only rectangular subregions are supported.
We expect that to be generalized to polygonal regions
in the future.
4) Dataplot now saves the following internal parameters after
all plots (not just those generated with PLOT):
PLOTCORR - correlation of the X and Y coordinates on the plot
PLOTCOR1 - correlation of the X and Y coordinates on the plot
with a tag value of 1. This can be useful for
plots that generate reference lines (which you
do not want included in the correlation computation
PLOTYMAX - maximum Y coordinate
YMAXINDE - index of the maximum Y coordinate
PLOTYMIN - minimum Y coordinate
YMININDE - index of the minimum Y coordinate
PLOTXMAX - maximum X coordinate
XMAXINDE - index of the maximum X coordinate
PLOTXMIN - minimum X coordinate
XMININDE - index of the minimum X coordinate
NACCEPT - number of plot points inside the first subregion
(0 if no subregions defined)
NREJECT - number of plot points outside the first subregion
(0 if no subregions defined)
NTOTAL - number of plot points (NACCEPT + NREJECT)
(0 if no subregions defined)
5) The following multiplots were added:
SCATTER PLOT MATRIX Y1 Y2 ... YK
FACTOR PLOT Y1 X1 ... YK
CONDITIONAL PLOT Y X TAG
a) SCATTER PLOT MATRIX Y1 ... YK
This generates all the pairwise scatter plots of Y1 ... YK
on a single page.
b) FACTOR PLOT Y X1 ... XK
This generates the plots Y VS X1, Y VS X2, .... , Y VS XK
on a single page.
c) CONDITIONAL PLOT Y X TAG
This generates PLOT Y VERSUS X for each unique value in
TAG on a single page.
There are a lot of variations possible with these types of
plots. For example, the basic concept is not limited to
scatter plots. For example, you can generate all the pairwise
bihistograms instead of the pairwise scatter plots. There are
many options in terms of labeling, what plot goes on the
diagonal, and so on.
There are various SET commands that control the appearance
and nature of these plots. Enter
HELP SCATTER PLOT MATRIX
HELP CONDITIONAL PLOT
HELP FACTOR PLOT
for a complete description of what is available.
Two variations of the SCATTER PLOT MATRIX are important enough
to be given special names:
DEX INTERACTION PLOT
YOUDEN MATRIX PLOT
These are described under HELP SCATTER PLOT MATRIX.
6) Fixed the following bugs.
a) The MULTIPLOT SCALE FACTOR did not work correctly with
the software fonts.
b) Entering "character blank", i.e., the blank is in lower case,
plotted BLAN as the plot character when DEVICE 1 FONT SIMPLEX
was used.
c) Using SP() with a software font did not work.
d) The BOX SHADOW OFF command was fixed to set the shadow
height and width to zero rather than to the default.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January - July 1999.
----------------------------------------------------------------------
1) Modified the IF command so that if there is an error (e.g., one
of the parameters is not defined), the IF status is set to
FALSE rather than being undefined.
2) Added the following time series commands.
a) Added the command
LET PERIOD = <value>
LET START = <value>
SEASONAL SUBSERIES PLOT Y
A seasonal subseries plot is used to determine if there
is significant seasonality in a time series. Instead of
a straight time order plot, it splits the plot into
the corresponding seasons (or periods). For example, for
monthly data, all the January values are plotted, then all
the February values, and so on. Reference lines are drawn
at the seasonal means.
b) Added the command
LET PERIOD = <value>
LET STLWIDTH = <value>
LET STLSDEG = <0/1>
LET STLTDEG = <0/1>
LET STLROBST = <0/1>
SEASONAL LOWESS Y (or SEASONAL LOESS Y)
READ DPST1F.DAT SEAS TREND
The SEASONAL LOWESS command decomposes a time series into
trend, seasonal, and residual components using techniques
based on locally weighted least squares. That is,
X(t) = TREND(t) + SEAS(t) + RES(t)
The seasonal and trend components are written to the file
DPST1F.DAT (dpst1f.dat on Unix systems) and can be read
back into Dataplot for further plotting and analysis. The
internal variable RES contains the residual component and
the internal variable PRED contains the trend plus the
seasonality component.
The SEASONAL LOWESS command accepts a number of options
which can be defined by the LET commands above. The most
important is the PERIOD parameter which identifies the number
of seasons (e.g., 12 for monthly data). The STLWIDTH
parameter identifies the number of data points to use
in the LOWESS steps and defaults to N/10. It is similar
to specifying the LOWESS FRACTION for standard LOWESS
smoothing. The more points used, the more smoothing that
occurs. The STLSDEG and STLTDEG parameters identify the
polynomial degree used in the lowess for the seasonal and
trend components respectively. By default, the seasonal
lowess performs some robustness iterations. Enter
LET STLROBST = 1 to suppress this.
This technique is described in
Cleveland, Cleveland, McRae, and Terpenning, "STL: A
Seasonal-Trend Decomposition Procedure Based on Loess",
Statistics Research Report, AT&T Bell Laboratories.
c) Added an ARIMA modeling capability. The command is:
ARMA Y AR DIFF MA SAR SDIFF SMA SPERIOD
where
Y = the response variable
AR = the order of auto-regressive terms
DIFF = number of differences to apply. DIFF is typically
0, 1, or 2. Differencing is one technique for
removing trend.
MA = order ot the moving average terms
SAR = order of seasonal auto-regressive terms.
SDIFF = number of seasonal differences to apply. It is
typically 0, 1, or 2.
SMA = order of seasonal moving average terms.
SPERIOD = period for seasonal terms. It defaults to 12
(if a seasonal component is included).
If there is no seasonal component, the last 4 terms may be
omitted.
To minimize the amount of screen output, but to also to
keep the maximum amount of information, Dataplot writes
most of the output to files. Speficially,
dpst1f.dat - the parameters and the standard deviations
of the parameters from the ARMA fit. The
order is:
1) Autoregressive terms
2) Seasonal autoregressive terms
3) Mean term
4) Moving average terms
5) Seasonal moving average terms
dpst2f.dat - this file contains:
1) Row number
2) Original series (i.e., Y)
3) Predicted values
4) Standard deviation of predicted values
5) Residuals
6) Standardized residuals
dpst3f.dat - Intermediate outut from iterations before
convergence. This is generally useful if
the ARMA fit does not converge.
dpst4f.dat - The parameter variance-covariance matrix.
dpst5f.dat - The forecast values for (N/10)+1 observations
ahead. Specifically,
1) The forecasted values
2) The standard deviation of the forecasted
values.
3) The lower 95% confidence band for the
forecast.
4) The upper 95% confidence band for the
forecast.
Dataplot allows you to define the starting values by
defining the variable ARPAR. The order of the parameters
is as given for the file dpst1f.dat above. By default,
all parameters are set to 1 except for the mean term which
is set to 0.
In addition, you can define the variable ARFIXED to fix
certain parameters to their start values. That is, you
define ARPAR to specify the start values. If the
corresponding element of ARFIXED is zero, the parameter is
estimated as usual. If ARFIXED is one, then the parameter
is fixed at the start value. The most common use of this
is to set certain parameters to zero. For example, if
you fit an AR(2) model and you want the AR(1) term to be
zero, you could enter the following:
LET ARPAR = DATA 0 1
LET ARFIXED = DATA 1 0
Dataplot uses the STARPAC library (developed by
Janet Rogers and Peter Tyrone of NIST) to compute the
ARIMA estimates.
ARIMA modeling is covered in many time series texts. It is
beyond the scope of this news file to discuss ARIMA modeling.
However, to use ARIMA models, it is generally recommended
that the series be at least 50 observations long. In addition,
if the series is dominated by the trend and seasonal factors,
an explicit trend, seasonal, and random component decomposition
method, such as the seasonal lowess described above, is
generally preferred to an explicit ARIMA model.
3) Added support for location and scale parameters for an additional
15 distributuins. Entering the command
LIST DISTRIBU.
will list the distributions table. This table shows which
distributions support location and scale parameters.
4) Added the following statistics:
Added the CNPK capability index statistics:
LET LSL = <value>
LET USL = <value>
LET A = CNPK Y
This statistic is now also supported for the following plots:
LET LSL = <value>
LET USL = <value>
CNPK PLOT Y X
DEX CNPK PLOT Y
The LSL and USL specify the lower specification and upper
specificiation engineering limits. The CNPK is a variant of the
CPK capability indices used for non-normal data and is defined as:
CNPK = MIN(A,B)
where
A = (USL-MEDIAN)/(P(.995)-MEDIAN)
B = (MEDIAN-LSL)/(MEDIAN-P(0.005))
P(0.995) and P(0.0050 are the 99.5 and 0.5 percentiles of the
data respectively.
Added the geometric mean and standard deviation and the
harmonic mean statistics.
LET A = GEOMETRIC MEAN Y
LET A = GEOMETRIC STANDARD DEVIATION Y
LET A = HARMONIC MEAN Y
This statistic is now also supported for the following plots:
GEOMETRIC MEAN PLOT Y X
GEOMETRIC STANDARD DEVIATION PLOT Y X
HARMONIC MEAN PLOT Y X
BOOTSTRAP GEOMETRIC MEAN PLOT Y X
BOOTSTRAP GEOMETRIC STANDARD DEVIATION PLOT Y X
BOOTSTRAP HARMONIC MEAN PLOT Y X
JACKNIFE GEOMETRIC MEAN PLOT Y X
JACKNIFE GEOMETRIC STANDARD DEVIATION PLOT Y X
JACKNIFE HARMONIC MEAN PLOT Y X
The geometric mean is defined as:
XGM = (PRODUCT(Xi))**(1/N)
The geometric standard deviation (SD means standard deviation of)
is defined as:
XSD = EXP(SD(LOG(Xi)))
The harmonic mean is defined as:
XHM = N/SUM(1/Xi)
5) Added the Wilks-Shapiro test for normality. The following
commands are equivalent.
WILKS SHAPIRO NORMALITY TEST Y
WILKS SHAPIRO TEST Y
WILKS SHAPIRO Y
There must be at least 3 values in Y. The computed significance
level is not neccessarily valid for N >= 5,000. This command
uses algorithm R94 from the Applied Statistics Journal.
6) Added the studentized range CDF and PPF functions.
LET A = SRACDF(X,V,R)
LET A = SRAPPF(P,V,R)
where V is the degrees of freedom and R is the number of
samples. X must be positive, V must be >= 1, and
R must be >= 2. For most applications, R = V + 1. The PPF
function is only supported for values in the range 0.90 to 0.99.
The studentized range is defined as:
Q = Range/(Standard deviation)
The studentized range is used in constructing confidence intervals
and significance levels for tests for multiple comparison in
analysis of variance problems.
7) Updated the Weibull maximum likelihood estimates to suport
censored data (both type 1 and type 2 and multiple). It also now
generates confidence intervals for the estimate (for various
significance levels). The command
SET CENSORING TYPE <NONE/1/2/MULTIPLE>
defines the censoring type. The EXPONENTIAL MLE output was
modified to be more readable and consistent with the Weibull
output.
8) Added the following quality control commands.
a) Added the following command to generate binomial based single
sample acceptance plans:
SINGLE SAMPLE ACCEPTANCE PLOT P1 P2 ALPHA BETA
where
P1 = Acceptable Quality Level
P2 = Lot Tolerence Percent Defective
ALPHA = Probability of a Type I error
BETA = Probability of a Type II error
b) Added a command to generate the average run length for the
cumulative sum (cusum) control chart. The average run length
is the average number of observations that are entered
before the system is declared out of control.
LET S0 = <value>
LET K = <value>
LET H = <value>
These commands set parameters required by the cusum ARL
calculation. Specifically,
S0 = start-up value for the cumulative sum. This is
usually zero. However, it can be set to a
positive initial value for a fast initial
response (FIR) cusum chart.
H = defines the value which signals that the cusum
is "out of control". A value of 5 is a common
choice.
K = the value of k is set to one half of the smallest
shift in location (in standard deviation units)
that you want to detect. A common choice is a
1-sigma shift, that is k = 0.5.
LET Y = ONE-SIDED CUSUM ARL DELTA
LET Y = CUSUM ARL DELTA
where DELTA defines the difference between the target value
of the process and the true value of the process. This is
a variable that is usually defined to be a sequence of values.
For example,
LET DELTA = SEQUENCE 0 .01 0.5
That is, this command returns the average run length for
a series of values that define the difference between the
target value and the true value of the process.
A typical sequence of commands would be
LET K = 0.5
LET H = 5
LET S0 = 0
LET DELTA = SEQUENCE 0 .01 1.0
LET Y = CUSUM ARL DELTA
PLOT Y DELTA
This command was implemented using Applied Statistics
algorithm 258. If unreasonable values are specified for the
parameters, this algorithm can generate unreasonable results.
9) Added the following commands:
ANOP LIMITS <low> <high>
PROPORTION CONFIDENCE LIMITS Y
DIFFERENCE OF PROPORTION CONFIDENCE LIMITS Y1 Y2
to generate a confidence interval for proportions and the
difference of two proportions respectively. The ANOP
LIMITS command is used to define the lower and upper bounds
that define a success. The confidence intervals are based
on the direct binomial computations, not the normal
approximation, so it is not limited by small N.
10) Added the command
WEB HANDBOOK <keyword>
This command access the NIST/SEMATECH Engineering Statistics
Handbook. A beta version of the Handbook will be released
May, 1999 (http://www.itl.nist.gov/div898/handbook/).
The <keyword> is matched against a file of keywords to
go to the appropriated location in the handbook. This
command is used primarily by the Dataplot GUI, but it can
also be entered by the end-user. If you want to see a list
of the supported keywords, enter
LIST HANDBK.TEX
The handbook provides tutorial information on many common
engineering statistical capabilities. This complements the
WEB HELP command, which accesses the on-line Dataplot Reference
Manual. The on-line Reference Manual is primarily concerned
with how you implement a statistical technique while the
Handbook provides more of a statistical tutorial.
If your site has downloaded the Handbook, enter a command
like the following:
SET HANDBOOK URL http://ketone.cam.nist.gov/cf/handbook/
to define the home directory for the handbook.
The web commands SET BROWSER and SET NETSCAPE OLD apply to
the WEB HANDBOOK as well. SET BROWSER defines the browser
and SET NETSCAPE OLD allows you to use a currently open
browser for the WEB HANDBOOK command. These commands are
discussed in more detail later in this news file.
11) Added the following non-parameteric tests.
a) The following are non-parametric alternatives to the
2-sample t test (i.e., test the hypothesis U1 = U2 where U1
and U2 are the population means for 2 samples).
SIGN TEST Y1 Y2
SIGN TEST Y1 Y2 D0
SIGN TEST Y1 MU
SIGNED RANK TEST Y1 Y2
SIGNED RANK TEST Y1 Y2 D0
SIGNED RANK TEST Y1 MU
RANK SUM TEST Y1 Y2
RANK SUM TEST Y1 Y2 D0
where Y1 and Y2 are the response variables and D0 and MU
are parameters. Specify D0 to test U1 - U2 = D0. The
2-sample test can also be used for the 1-sample test
U1 = MU.
The SIGN TEST and SIGNED RANK TEST commands only apply to
paired samples. The RANK SUM TEST command does not require
equal sample sizes.
b) The following performs the Kruskal-Wallis non-parametric
1-sample ANOVA.
KRUSKAL WALLIS Y X
where Y is the response variable and X is the factor
variable.
12) Added the following plot commands:
a) TUKEY MEAN-DIFFERENCE PLOT Y1 Y2
A Tukey mean-difference plot is an enhancement of the
quantile-quantile (q-q) plot. It converts the interpretation
of the q-q plot from the differences around a diagonal line
to the differences around a horizontal line. If T(i) and
D(i) are the vertical and horizontal coordinates for the q-q
plot, the Tukey mean-difference plot is (T(i) - D(i)) versus
(T(i) + D(i))/2. A horizontal reference line is drawn at
zero.
b) SPREAD LOCATION PLOT Y TAG
The spread-location (s-l) plot is a robust alternative to
the homoscedasticity plot.
Given a response variable Y and a group-id variable X,
the homoscedasticity plot is the group standard deviations
versus the group means. This is a graphical measure of
constant spread across groups.
The s-l plot has the square roots of the absolute value of
the Y(i) minus their group medians on the vertical axis and
the group medians on the horizontal axis. A reference line
connects the group medians.
When setting the LINE and CHARACTER commands, the reference
line is the first trace and the data starts with trace 2
(each group is identified as a unique trace). That is, to
draw the data points as circles and the reference line as a
solid line, do something like the following
CHARACTER CIRCLE ALL
CHARACTER BLANK
LINE BLANK ALL
LINE SOLID
SPREAD LOCATION PLOT Y X
c) RF SPREAD PLOT
The residuals-fitted (r-f) spread plot is a graphical measure
of the goodness of fit. That is, this command is preceeded
by some type of fit. It plots percent point (or quantile)
plots of the fitted values minus their mean and the residuals
arranged side by side with a common vertical scale.
The vertical spread of the residuals compared to the vertical
spread of the fitted values gives an indication of how much
of the variation is explained by the fit.
13) Added the following special functions:
a) LET A = ABRAM0(X,ORD)
This computes the Abramowitz function for order ORD.
currently, ORD can be an integer from 0 to 100.
b) LET A = CLAUSN(X)
This computes the Clausen integral.
c) LET A = DEBEYE(X,ORD)
This computes the Debeye function of order ORD. ORD
can be 1, 2, 3, or 4.
d) LET A = EXP3(X)
This computes the cubic exponential integral.
e) LET A = GOODST(X)
This computes the Goodwin and Stanton integral.
f) LET A = LOBACH(X)
This computes the Lobachevski's integral.
g) LET A = SYNCH1(X)
LET A = SYNCH2(X)
This computes the synchrotron radiation functions.
h) LET A = STROM(X)
This computes Stromgren's integral.
i) LET A = TRAN(X,ORD)
This computes the transport integrals of order ORD.
ORD can be 2, 3, 4, 5, 6, 6, 8, or 9.
These special functions are computed using ACM algorithm 757.
Formulas for these functions are given in:
Allan MacLead, "ACM Transactions of Mathematical Software",
Vol. 22, No. 3, September 1996, pp. 288-301.
13) Fixed a bug in the CD command for Unix platforms. The CD command
allows you to set the default directory.
A few other miscellaneous bugs have also been fixed.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT September - Dec 1998.
----------------------------------------------------------------------
1) Added the following MATRIX commands:
LET MEAN = MATRIX GROUP MEANS M TAG
LET SD = MATRIX GROUP SD M TAG
LET SPOOL = POOLED VARIANCE-COVARIANCE MATRIX M TAG
The MATRIX GROUP MEANS and MATRIX GROUP SD commands compute
the group means and standard deviations of a matrix.
The POOLED VARIANCE-COVARIANCE MATRIX computes a pooled
variance-covariance matrix.
These commands all operate on a matrix (M) and a group
id variable (TAG). The TAG variable has the same number of
rows as the matrix M. The values of TAG are typically integers
and they identify the group to which the corresponding row
of the matrix belongs.
The MATRIX GROUP MEANS/SD commands return a matrix with the
same number of columns as the original matrix M and with
the number or rows equal the number of groups identified
by the TAG variable. That is, MEANS(2,3) is the mean of
of the third variable of the second group.
The pooled variance-covariance matrix:
SPOOL = (1/SUM(N(i)-1)) * SUM((1/N(i)-1)*C(i)))
where N(i) is the number of elements in group i and C(i)
is the variance-covariance matrix of the rows belonging to
group i. An earlier implementation of this command
works with 2 matrices (and no group id variable). This
version of the command still works. That is, if the second
argument to POOLED VARIANCE-COVARIANCE MATRIX command is
a matrix, it is assumed that there are 2 groups and the
data for each group is stored in a separate matrix. If the
second argument is a variable, it is assumed that it is a
group id variable and the data for all matrices are stored
in a single matrix. For the 2 group case, either syntax
will work. For more than 2 groups, only the new syntax
will work.
2) The following control chart enhancements were added:
a) HOTELLING CONTROL CHART Y1 Y2 ... YK GROUP
This commands implements a Hotelling multivariate
control chart. Given p response variables, the Hotelling
control chart computes and plots the following for each group:
T-square = n*(xbar - u0)'SINV(xbar - u0)
N is the size of the group, xbar is a vector of the p
sample means for the subgroup, and u0 is a vector of the
p sample means for the entire data set. That is a 1-sample
Hotelling test is computed to test whether the means for
a given group are equal to the overall sample means.
An upper control limit (there is no lower control limit)
is drawn at the appropriate F statistic for the Hotelling
test. The value of alpha for the F test is chosen so
that alpha/(2*p) = 0.00135. This corresponds to the
3-sigma value for a univariate chart. You can specify
your own control limit, set by whatever criterion that
you deem appropriate, by entering the command:
LET USL = <value>
You can control the appearance of this chart by setting
the lines and character switches. The traces are:
Trace 1 = T-square values
Trace 2 = Zero reference line
Trace 3 = Dataplot calculated control limit
Trace 4 = User specified upper control limit
For example, to draw the T-square values as a solid line
and an X, no zero reference line, the Dataplot control
limit as a dotted line, and no user specified control
limit, enter the commands:
LINE SOLID BLANK DOTTED BLANK
CHARACTER X BLANK BLANK BLANK
b) CUSUM CONTROL CHART Y X
This command implements a mean cumulative sum control
chart.
There are numerous variations on how cusum control
charts are implemented. Dataplot follows the methods
discussed by Thomas Ryan in "Statistical Methods for
Quality Improvement". Dataplot does the following:
i) Positive and negative sums are computed as follows:
SUMH = MAX[0,(z(i) - k) + SUMH(i-1)]
SUML = MAX[0,(-z(i) - k) + SUML(i-1)]
SUMH and SUML have initial values of 0. Z(i) is
the z-score of the ith group (that is, the sub-group
mean minus the overall mean divided by the
standard deviation of xbar.
Dataplot plots the negative of SUML. This is to
avoid overlap for the plottting of SUMH and SUML.
SUMH is plotted on the positive scale vertically and
SUML is plotted on the negative scale vertically.
The value of k is set to one half of the smallest
shift in location (in standard deviation units)
that you want to detect. Dataplot by default selects
a 1-sigma shift, that is k = 0.5. To overide this,
enter the command
LET K = <value>
ii) By defauult, Dataplot sets the control limit at
a value of 5. That is, if the one of the sums exceeds
5, the process is deemed out of control. To override
the default value, enter the command
LET H = <value>
The value for H is typically between 4 and 5.
3) The following command was added:
TOLERANCE LIMITS Y
This computes univariate two-sided tolerance limits for the normal
case and for the distribution free case.
Tolerance limits are a generalization of confidence limits
for the mean. However, instead of a confidence limit for a
single value, it provides confidence limits for the interval
that contains a given percentage of the data (this is called
the coverage). That is, for 90% coverage, we are finding
a confidence interval that contains 90% of the data.
4) Bug fixes:
a) The PP command was fixed for the LAHEY and Microsoft PC
versions of Dataplot.
b) Fixed the RESET VARIABLES command so that it would not
delete parameters, functions, and strings. Note that
RESET DATA still deletes them.
5) Added the percentile statistic:
LET A = <value> PERCENTILE Y
where <value> is a number between 0 and 100.
This statistic is now also supported for the following plots:
LET P100 = <value>
PERCENTILE PLOT Y X
BOOTSTRAP PERCENTIL PLOT Y
JACKNIFE PERCENTILE PLOT Y
PERCENTILE BLOCK PLOT Y
DEX PERCENTILE PLOT Y
The LET P100 = <value> command defines the percentile you
want to compute for all of these plots.
Fixed a small bug in the ...DECILE command.
6) Added the CPM and CC capability index statistics:
LET LSL = <value>
LET USL = <value>
LET TARGET = <value>
LET A = CPM Y
LET A = CC Y
This statistic is now also supported for the following plots:
LET LSL = <value>
LET USL = <value>
LET TARGET = <value>
CPM PLOT Y X
DEX CPM PLOT Y
CC PLOT Y X
DEX CC PLOT Y
The LSL, USL, and TARGET specify the lower specification,
upper specificiation, and target engineering limits. The
CPM is a variant of the CP and CPK capability indices and
is defined as:
CPM = (USL-LSL)/(6*SQRT(S**2+(XBAR-TARGET)**2))
where XBAR and S are the sample mean and standard deviation.
For this index, the larger the better.
The CC statistic is defined as:
CC = MAX((TARGET-XBAR)/(TARGET-LSL),(XBAR-TARGET)/USL)
For this index, the smaller the better.
7) Added the following commands:
<dist> CHI-SQUARE GOODNESS OF FIT TEST Y
<dist> CHI-SQUARE GOODNESS OF FIT TEST Y X
<dist> CHI-SQUARE GOODNESS OF FIT TEST Y X1 X2
<dist> KOLMOGOROV-SMIRNOV GOODNESS OF FIT TEST Y
These commands test whether or not a data set
comes from a specified distribution. All distributions for
which Dataplot can generate a cdf function are supported (there
are 70+ such distributions in Dataplot). The names are identical
to the names used for the PROBABILITY PLOT command.
A couple of notes on these commands:
a) The KOLMOGOROV-SMIRNOV test is not supported for discrete
distributions.
b) The CHI-SQUARE test works with either binned or unbinned
data.
Dataplot supports 2 types of pre-binned data. If your data
has equal sized bins, then the X variable contains the
mid-point of each bin. If your bins may be of different
sizes, then the X1 variable is the lower limit of each
class and X2 is the upper limit of each class. Unequal
bins usually result from combining classes with low expected
frequency.
It uses the same rules for binning as it does for the
HISTOGRAM command. That is, the class width is 0.3*S where S
is the standard deviation of Y. The upper and lower limits are
the mean plus or minus 6 times the standard deviation.
The BINNED command generates counts while the RELATIVE BINNED
generates relative frequency.
As with the histogram, you can override these defaults with the
following commands:
CLASS WIDTH <value>
CLASS LOWER <value>
CLASS UPPER <value>
c) You need to specify shape parameters for distributions that
require it. For example,
LET GAMMA = 2
GAMMA CHI-SQUARE GOODNESS OF FIT Y
The parameter names are equivalent to the names used for
the PROBABILITY PLOT command.
Location and shape parameters can be specified genrically
for the CHI-SQUARE and KOLMOGOROV-SMIRNOV tests respectively
by entering:
LET CHSLOC = <value>
LET CHSSCALE = <value>
LET KSLOC = <value>
LET KSSCALE = <value>
These are optional.
8) Added the following commands:
2-SAMPLE CHI-SQUARE TEST Y1 Y2
2-SAMPLE KOLMOGOROV-SMIRNOV TEST Y1 Y2
These 2 commands test whether 2 data samples come from a
common (unspecified) distribution. Y1 and Y2 do not need
to be the same size.
9) Updated the TABULATE and CROSS-TABULATE commands. The computed
group id's and the value of the statistic are written to
the file DPST1F.DAT (or dpst1f.dat on Unix). This simplifies
using the results in further analysis. For example, to
compute the group means and store them in a variable, do
something like the following:
TABULATE MEANS Y X
SKIP 1
READ DPST1F.DAT GROUPID YMEANS
SKIP 0
The CROSS-TABULATE is similar, except there are 2 group-id
variables instead of 1.
10) Added the following command:
LET Y2 X = BINNED Y (or LET Y2 X = FREQUENCY TABLE Y)
LET Y2 X = RELATIVE BINNED Y
(or LET Y2 X = RELATIVE FREQUENCY TABLE Y)
Here, Y2 will contain the counts (or frequencies) and X will
contain the bin mid-points.
This command bins your data. It uses the same rules as the
histogram. That is, the class width is 0.3*S where S is the
standard deviation of Y. The upper and lower limits are
the mean plus or minus 6 times the standard deviation.
The BINNED command generates counts while the RELATIVE BINNED
generates relative frequency.
As with the histogram, you can override these defaults with the
following commands:
CLASS WIDTH <value>
CLASS LOWER <value>
CLASS UPPER <value>
The command SET RELATIVE HISTOGRAM <AREA/PERCENT> specifies
whether or not relative binning is computed so that the area
sums to 1 or so that the frequencies sum to 1. The first option,
which is the default, is useful when using the
relative binning as an estimate of a probability distribution.
The second option is useful when you want to see what percentage
of the data falls in a given class.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT June - August 1998.
----------------------------------------------------------------------
1) Added the following command:
EMPIRICAL CDF PLOT Y
This generates an empirical CDF plot.
2) Made the following enhancements to the QWIN (the Microsoft
95/NT version) device driver:
Added support for "true color". Previously, if the user
had true color set for the display, the screen colors were
all black (i.e., you couldn't see the output).
Note that true color is something you set from the
Windows 95/NT control panel, not something that Dataplot
can set automatically. That is, you set true color or
standard VGA mode from the control panel and then you
enter the appropriate Dataplot commands to support that
mode.
a) If you have your display set to true color, enter the
following commands in the C:\DPLOGF.TEX file:
SET QWIN COLOR DIRECT
DEVICE 1 QWIN
Note that the order is significant here. The color model
is set when the QWIN device is initialized, so the
SET QWIN COLOR command must come before the DEVICE 1 QWIN
command. Also, it is recommended that you put these commands
in the DPLOGF.TEX file so that you do not get the initial
blank screen where you cannot see the text that you type.
The command SET QWIN COLRO VGA resets the default.
b) For true color, the QWIN device driver supports the full
complement of colors recognized by Dataplot (HELP COLORS
for a description of the Dataplot color model). The default
VGA mode only supports 16 colors.
c) The foreground and background colors for the text window
can now be set for both standard VGA and true color modes.
The following 2 commands, if used, should be entered
after the SET QWIN COLOR <DIRECT/VGA> command and before
the DEVICE 1 QWIN command:
SET QWIN TEXT BACKGROUND COLOR <index>
SET QWIN TEXT FOREGROUND COLOR <index>
where <index> is an integer identifying the desired
color (HELP COLOR gives the index to color mapping in
Dataplot). For VGA mode, <index> is restricted to 0 to
15. For DIRECT mode, <index> is restricted to 0 to 88.
The default for both VGA and DIRECT mode is a white
foreground on a black background. The colors for the
graphics window are set by the normal Dataplot COLOR
commands (e.g., BACKGROUND COLOR BLUE, LINE COLOR RED).
3) Added the following new matrix commands:
The following 2 commands are used to obtain row or column
statistics for a matrix.
LET Y = MATRIX ROW <STAT> M
LET Y = MATRIX COLUMN <STAT> M
where <STAT> is one of: MEAN, MIDMEAN, TRIMMED MEAN,
WINSORIZED MEAN, MEDIAN, SUM, PRODUCT, SD (or STANDARD DEVIATION),
SD OF MEAN, VARIANCE, VARIANCE OF MEAN, RELATIVE VARIANCE,
RELATIVE STANDARD DEVIATION, COEFFICIENT OF VARIATION,
AVERAGE ABSOLUTE DEVIAITION, MEDIAN ABSOLUTE DEVIATION, RANGE,
MIDRANGE, MAXIMUM, MINIMUM, EXTREME, LOWER HINGE, UPPER HINGE,
LOWER QUARTILE, UPPER QUARTILE, SKEWNESS, KURTOSIS,
AUTOCOVARIANCE, AUTOCORRELATION.
The following command computes an overall mean for the matrix:
LET A = MATRIX MEAN M
The following command calculates the quadratic form of a
vector and a matrix. The quadratic form is: x'Mx where x
is a vector and M is a matrix. Quadratic forms are used
frequently in multivariate statistical calculations.
LET A = QUADRATIC FORM M X
The following command is a commonly used quadratic form:
LET Y = DISTANCE FROM MEAN M
This command generates:
Di = (Xi - XMEAN)'SINV(Xi-XMEAN)
where Xi is the ith row, XMEAN is a vector of the column
means, and SINV is the inverse of the variance-covariance
matrix. That is, Di is the distance of the ith row of the
matrix from the mean. Note that in the Dataplot command, you
specify the original matrix, not the variance-covariance matrix.
The following command cacluate X*X' for the vector X. The
result is a pxp matrix where p is the number of rows of X.
This computation is used in some multivariate analyses.
LET M = VECTOR TIMES TRANSPOSE X
The following command is used to create linear combinations:
LET Y2 = LINEAR COMBINATION M C
If the matrix M has p columns and n rows, C should be a vector
with p rows. This commands calculates:
y2 = c(1)*M1 + c(2)*M2 + c(3)*M3 + ... + c(p)*Mp
where M1, M2, ... are the columns of the matrix. The result
is a vector with n rows.
The following commands are used to calculate various distance
matrices:
LET D = EUCLIDEAN ROW DISTANCE M
LET D = EUCLIDEAN COLUMN DISTANCE M
LET D = MAHALANOBIS ROW DISTANCE M
LET D = MAHALANOBIS COLUMN DISTANCE M
LET D = MINKOWSKY ROW DISTANCE M
LET D = MINKOWSKY COLUMN DISTANCE M
LET D = CHEBYCHEV ROW DISTANCE M
LET D = CHEBYCHEV COLUMN DISTANCE M
LET D = BLOCK ROW DISTANCE M
LET D = BLOCK COLUMN DISTANCE M
It is often desirable to scale the original matrix before
calculating a distance matrix. The following commands can
be used to scale the original matrix:
SET MATRIX SCALE <NONE/MEAN/SD/RANGE/ZSCORE>
LET MSCAL = MATRIX ROW SCALE M
LET MSCAL = MATRIX COLUMN SCALE M
The SET MATRIX SCALE command is used to define the type of
scaling to perform. You can scale either across rows or down
columns.
The following command computes the pooled sample
variance-covariance matrix for two matrices:
LET MOUT = POOLED VARIANCE-COVARIANCE MATRIX MA MB
Note that MA and MB should have the same number of columns.
However, the number of rows can vary.
The following computes a 1-sample Hotelling T-square test:
LET A = 1-SAMPLE HOTELLING T-SQUARE M Y
The 1-sample Hotelling t-square tests the following hypothesis:
H0: U=U0
Here, U0 is a vector of population means. That is, the
hypothesied means for each column of the matrix. In the
above syntax, M is a matrix containing the original data
and Y is a vector containing the hypothesized means. The
returned parameter A contains the value of the Hotelling
T-square test statistic. The critical values corresponding
to alpha = .90, .95, .99, and .995 are saved in the internal
parameters B90, B95, B99, and B995.
The following computes a 2-sample Hotelling T-square test:
LET A = 2-SAMPLE HOTELLING T-SQUARE MA MB
The 2-sample Hotelling t-square tests the following hypothesis:
H0: U1=U2
Here, U1 is a vector of population means for sample 1 and
U2 is a vector of population means for sample 2. In the
above syntax, MA is a matrix containing the original data
for sample 1 and MB is a matrix containing the original data
for sample 2. MA and MB must have the same number of columns.
However, they can have a different number of rows. The
returned parameter A contains the value of the Hotelling
T-square test statistic. The critical values corresponding
to alpha = .90, .95, .99, and .995 are saved in the internal
parameters B90, B95, B99, and B995.
The following 2 commands add or delete rows of a matrix:
LET M = MATRIX ADD ROW M Y
LET M = MATRIX DELETE ROW M ROWID
Here, M is a matrix, Y is a variable with the number of rows
equal to the number of columns in M, and ROWID is a scalar
identifying the row to delete.
4) Fixed a bug in the character fill for the QWIN device
driver (DEVICE 1 QWIN for Windows 95/NT). Removed the line
CHARACTER FILL COLOR from the sample DPLOGF.TEX file (this
caused problems for Postscript output).
5) Added support for SP() in the LET STRING command. SP() will
be converted to a single space. Previously, LET STRING packed
out any spaces in the string.
6) Added the command:
LET Y2 = EXPONENTIAL SMOOTHING Y ALPHA
This performs an exponential smoothing of Y. The formual is:
Y2(1) = Y(1)
Y2(I) = ALPHA*Y(I) + (1-ALPHA)*Y(I-1), I > 1
ALPHA is the smoothing parameter and should be greater than
0 and less than 1.
7) The PROBE command is used to return the values of certain
internal parameters and strings. This command was updated
so that the returned value is automatically saved. If the
returned value is an integer or real number, then the value
is stored in the internal parameter PROBEVAL. If the
returned value is a string, then the value is stored in the
internal string PROBESTR. PROBESTR and PROBEVAL can then be
used in the same way as other parameters and strings.
This feature is typically used in macros. For example, you
might want to use the machine maximum value as a "missing
value" indicator. A host independent way of using this value
would now be:
PROBE CPUMAX
LET MACHMAX = PROBVAL
You could then use the parameter MACHMAX wherever you wanted
to define a missing value.
8) Multiplots create new 0 to 100 coordinate units for each
subplot and character sizes are scaled according to this
new subplot area. Although this is generally desirable,
sometimes the resulting character sizes are too small or
distorted if the rows to columns ratio is too far from 1.
As a convenience, the following command was added to allows
all character sizes to be scaled when multiplotting is
in effect:
MULTIPLOT SCALE FACTOR 3
MULTIPLOT SCALE FACTOR 1 2
In the first syntax, both the height and width sizes are
scaled (by 3 in this example) by the same factor. In the
second syntax, the height and width are scaled separately
(the height by 1 and the width by 2 in this example).
The word FACTOR is optional in the command.
The scale factor is multiplied by the requested size. For
example, if the title size is 2 and the scale factor is 3,
then the effective size will be 6. The scale factor is
ignored if multi-plotting is not in effect.
This command allows character sizes to be easily adjutsted
for multiplots without having to enter a number of separate
size commands before the multiplot (and then after the
multiplot to return to normal values).
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January - May 1998.
----------------------------------------------------------------------
1) Reliability/Extreme Value Updates
a) Added the following commands for finding maximum likelihood
estimates for distribution parameters.
WEIBULL MAXIMUM LIKELIHOOD Y
EXPONENTIAL MAXIMUM LIKELIHOOD Y
DOUBLE EXPONENTIAL MAXIMUM LIKELIHOOD Y
NORMAL MAXIMUM LIKELIHOOD Y
LOGNORMAL MAXIMUM LIKELIHOOD Y
PARETO MAXIMUM LIKELIHOOD Y
GAMMA MAXIMUM LIKELIHOOD Y
INVERSE GAUSSIAN MAXIMUM LIKELIHOOD Y
GUMBEL MAXIMUM LIKELIHOOD Y (or EV1)
POWER MAXIMUM LIKELIHOOD Y
BINOMIAL MAXIMUM LIKELIHOOD Y
POISSON MAXIMUM LIKELIHOOD Y
At this time, only the parameter estimates are computed,
that is no standard errors or confidence intervals for the
estimates are computed.
There are various synonyms for these commands. For example,
WEIBULL MAXIMUM LIKELIHOOD ESTIMATE Y
WEIBULL MAXIMUM LIKELIHOOD Y
WEIBULL MLE ESTIMATE Y
WEIBULL MLE Y
are all equivalent. Similar synonyms apply to the other
commands.
The exponential case is an exception in that it does
print confidence intervals. It also supports type 1 and
type 2 censored data. For example, the full sample case
is:
SET CENSORING TYPE NONE (this is the default)
EXPONENTIAL MLE Y
Type 1 censoring is censoring at a fixed time t0. This
is handled via:
SET CENSORING TYPE 1
LET TEND = <censor time>
EXPONENTIAL MLE Y
If you have data values that are censored before time t0, then
create a TAG variable with 1 for failure times and 0 for
censoring times. You would the enter:
EXPONENTIAL MLE Y TAG
Type 2 censoring is censoring after R failures have been
observed. This case is handled via:
SET CENSORING TYPE 2
EXPONENTIAL MLE Y TAG
where TAG is variable with 1 for failure times and 0 for
censoring times.
Related to this are the commands
DEHAAN Y
CME Y
These generate parameter estimates for the generalized
Pareto distribution for extreme value applications.
b) Added the following commands:
1) LET Y = CUMULATIVE HAZARD X TAG
LET Y = HAZARD X TAG
where X is a list of failure times and TAG is an array
that identifies the value as a failure time (TAG = 1) or
a censoring time (TAG = 0).
2) LET Y = INTERARRIVAL TIMES X
where X is a list of failure times. This is similar to
the SEQUENTIAL DIFFERENCE command in that it calculates
X(I)-X(I-1). However, it sorts the data first and the
first interarrival time is set equal to X(1).
3) LET Y = CUMULATIVE AVERAGE X
LET Y = CUMULATIVE MEAN X
As the name implies, this computes the cumulative mean of
a variable. One use of this is to compute cumulative mean
time between failures for reliability data.
4) LET Y = REVERSE X
LET Y = FLIP X
This reverses the order of a variable (i.e., Y(1)=X(N),
Y(2)=X(N-1), and so on). For example, if you want to
sort from high to low instead of low to high, you can enter
LET Y = SORT X
LET Y = REVERSE Y
5) LET ALPHA = <value>
LET BETA = <value>
LET Y = POWER LAW RANDOM NUMBERS FOR I = 1 1 N
This generates N failure times from a non-homogeneous
Poisson process following the power law. That is,
M(t) = alpha*t**beta alpha, beta > 0
where M(t) is the expected number of failures at time
t. The random failure times are generated from the
formula for the interarrival times (i.e., the CDF for
the waiting time for the next failure given a failure at
time T):
F (t) = 1 - EXP(-ALPHA*[(T+t)**BETA-T**BETA]
T
c) The following 2 plots were added:
KAPLAN MEIER PLOT Y TAG
MODIFIED KAPLAN MEIER PLOT Y TAG
Here, Y is a list of failure times and TAG identifies censored
data. A value of 1 for TAG means that the corresponding Y
value is a failure time and a value of 0 means that the
corresponding Y value was censored. The TAG variable is
optional (if omitted, no censoring is performed).
Kaplan-Meier estimates are discussed in most texts in survival
or reliability analysis. The modified Kaplan-Meier is a
slightly adjusted form of the estimate.
The X axis of the plot is failure time and the Y axis is
an estimate of survival (or reliability). Some analysts
prefer that the Y axis be CDF estimate (i.e., 1 - Survival).
Enter the command
SET KAPLAN MEIER CDF
to specify this (and SET KAPLAN MEIER RELIABILITY to reset it).
If you want the numeric Kaplan Meier estimates, do
KAPLAN MEIER PLOT Y TAG
LET RELI = YPLOT
LET FAILTIME = XPLOT
The variables RELI and FAILTIME can be used in subsequent
commands to do further analysis.
d) The following plots were added:
EXPONENTIAL HAZARD PLOT Y TAG
NORMAL HAZARD PLOT Y TAG
LOGNORMAL HAZARD PLOT Y TAG
WEIBULL HAZARD PLOT Y TAG
Hazard plots are similar to probability plots. However,
they can be used with censored data and are commonly used
in reliability studies.
e) Added the following command:
DUANE PLOT Y
Given a set of failure times T, the Duane plot is
Ti/i (where i is the index from 1 to N) versus Ti on
a log-log scale. You do not need to specify XLOGON or YLOG ON
as Dataplot does this automatically. Dataplot also resets
the original values for these switches after the Duane plot
is completed.
A line is fit to the plotted data. Various parameters from
the fit are saved as internal parameters (enter
STATUS PARAMETERS after the DUANE PLOT to see what they are).
A typical use would be:
READ FAILURE.DAT Y
Y1LABEL CUMULATIVE MEAN TIME BETWEEN FAILURE
X1LABEL FAILURE TIME
CHARACTER X BLANK
LINE BLANK SOLID
DUANE PLOT Y
JUSTIFCATION CENTER
MOVE 50 7
TEXT SLOPE OF FITTED LINE = ^BETA
MOVE 50 4
TEXT INTERCEPT OF FITTED LINE = ^ALPHA
f) The following command was added:
RELIABILITY TRENDS TEST Y
This command is used in reliability applications to determine
if repair times show a significant trend. It computes the
following 3 tests:
a) Reverse Arrangement Test
b) Military Handbook Test
c) Laplace Test
The last 2 tests require the censoring time. This is entered
(before the RELIABILITY TRENDS TEST) as:
LET TEND = <value>
The value of TEND should be greater than the maximum value
of the response variable.
Some of the Probability and Recipe updates discussed below are
also relevant to reliability applications.
2) Probability Updates
a) Added optional location and scale parameters for many of the
probability functions.
Specifically, the following functions now support both location
and scale parameters:
CAUCDF, CAUPDF, CAUPPF, CAUSF
DEXCDF, DEXPDF, DEXPPF, DEXSF
DGACDF, DGAPDF, DGAPPF
DWECDF, DWEPDF, DWEPPF
EV1CDF, EV1PDF, EV1PPF
EV2CDF, EV2PDF, EV2PPF
EWECDF, EWEPDF, EWEPPF
EXPCDF, EXPPDF, EXPPPF
FLCDF, FLPDF, FLPPF
GAMCDF, GAMPDF, GAMPPF
GEVCDF, GEVPDF, GEVPPF
GGDCDF, GGDPDF, GGDPPF
GLOCDF, GLOPDF, GLOPPF
HFCCDF, HFCPDF, HFCPPF
HFNCDF, HFNPDF, HFNPPF
IGCDF, IGPDF, IGPPF
LGACDF, LGAPDF, LGAPPF
LGNCDF, LGNPDF, LGNPPF
LLGCDF, LLGPDF, LLGPPF
LOGCDF, LOGPDF, LOGPPF
NORCDF, NORPDF, NORPPF
RIGCDF, RIGPDF, RIGPPF
WEICDF, WEIPDF, WEIPPF
NOTE: The help files and Reference Manual refer to the
location parameter for the 2-parameter inverse gaussian
(IG), reciprocal inverse gaussian (RIG), Wald (WAL), and
fatigue life (FL) distributions. This is actually the
scale parameter for these distributions.
The following added a location parameter only:
HFLCDF, HFLPDF, HFLPPF
PA2CDF, PA2PDF, PA2PPF
PARCDF, PARPDF, PARPPF
PEXCDF, PEXPDF, PEXPPF
PLNCDF, PLNPDF, PLNPPF
PNRCDF, PNRPDF, PNRPPF
VONCDF, VONPDF, VONPPF
WALCDF, WALPDF, WALPPF
WCACDF, WCAPDF, WCAPPF
The following added a scale parameter only:
GEPCDF, GEPPDF, GEPPPF
POWCDF, POWPDF, POWPPF
The following added a lower and upper limit (which is then
converted by Dataplot into location and scale parameters).
UNICDF, UNIPDF, UNIPPF, UNISF
BETCDF, BETPDF, BETPPF, BETSF
b) Added the following hazard and cumulative hazard functions:
NOTE: In the following, LOC and SCALE specify location and
scale parameters respectively and are optional. For the
uniform, the lower and upper limits are specified (and
are converted by Dataplot to location and scale
parameters) and are also optional. All other parameters
are the standard shape parameters for the distribution.
UNIHAZ(X,LOWER,UPPER) - uniform hazard function
UNICHAZ(X,LOWER,UPPER) - uniform cumulative hazard function
NORHAZ(X,LOC,SCALE) - normal hazard function
NORCHAZ(X,LOC,SCALE) - normal cumulative hazard function
LGNHAZ(X,SD,LOC,SCALE) - normal hazard function
LGNCHAZ(X,SD,LOC,SCALE) - normal cumulative hazard function
PNRHAZ(X,SD,P,LOC) - power normal hazard function
PNRCHAZ(X,SD,P,LOC) - power normal cumulative hazard
function
PLNHAZ(X,SD,P,LOC) - power log-normal hazard function
PLNCHAZ(X,SD,P,LOC) - power log-normal cumulative
hazard function
EXPHAZ(X,LOC,SCALE) - exponential hazard function
EXPCHAZ(X,LOC,SCALE) - exponential cumulative hazard
function
WEIHAZ(X,GAMMA,LOC,SCALE) - Weibull hazard function
WEICHAZ(X,GAMMA,LOC,SCALE) - Weibull cumulative hazard
function
EWEHAZ(X,GAMMA,THETA,LOC,SCALE) - exponentiated Weibull
hazard function
EWECHAZ(X,GAMMA,THETA,LOC,SCALE) - exponentiated Weibull
cumulative hazard function
GAMHAZ(X,GAMMA,LOC,SCALE) - gamma hazard function
GAMCHAZ(X,GAMMA,LOC,SCALE) - gamma cumulative hazard function
IGAHAZ(X,GAMMA,LOC,SCALE) - inverted gamma hazard function
IGACHAZ(X,GAMMA,LOC,SCALE) - inverted gamma cumulative hazard
function
GGDHAZ(X,GAMMA,K,LOC,SCALE) - generalized gamma hazard
function
GGDCHAZ(X,GAMMA,K,LOC,SCALE) - generalized gamma cumulative
hazard function
EV1HAZ(X,GAMMA,LOC,SCALE) - Gumbel hazard function
EV1CHAZ(X,GAMMA,LOC,SCALE) - Gumbel cumulative hazard
function
EV2HAZ(X,GAMMA,LOC,SCALE) - Frechet hazard function
EV2CHAZ(X,GAMMA,LOC,SCALE) - Frechet cumulative hazard
function
GEPHAZ(X,GAMMA,SCALE) - generalized Pareto hazard
function
GEPCHAZ(X,GAMMA,SCALE) - generalized Pareto cumulative
hazard function
IGHAZ(X,GAMMA,LOC,SCALE) - inverse gaussian hazard function
IGCHAZ(X,GAMMA,LOC,SCALE) - inverse gaussian cumulative
hazard function
WALHAZ(X,GAMMA,LOC) - Wald hazard function
WALCHAZ(X,GAMMA,LOC) - Wald cumulative hazard function
RIGHAZ(X,GAMMA,LOC,SCALE) - reciprocal inverse gaussian
hazard function
RIGCHAZ(X,GAMMA,LOC,SCALE) - reciprocal inverse gaussian
cumulative hazard function
FLHAZ(X,GAMMA,LOC,SCALE) - fatigue life hazard function
FLCHAZ(X,GAMMA,LOC,SCALE) - fatigue life cumulative hazard
function
PARHAZ(X,GAMMA,LOC) - Pareto hazard function
PARCHAZ(X,GAMMA,LOC) - Pareto cumulative hazard
function
ALPHAZ(X,ALPHA,BETA) - alpha hazard function
ALPCHAZ(X,ALPHA,BETA) - alpha cumulative hazard function
PEXHAZ(X,ALPHA,BETA) - exponetial power hazard function
PEXCHAZ(X,ALPHA,BETA) - exponential power cumulative
hazard function
NOTE: The hazard function is defined as:
h(x) = pdf(x)/(1-cdf(x))
and the cumulative hazard function is defined as:
H(x) = -log(1-cdf(x))
where pdf and cdf are the probability density and
cumulative distribution functions respectively. These
functions can be used to generate hazard and cumulative
hazard functions for distributions that Dataplot does
not support directly.
c) Added the mixture of 2 normal probability functions.
Specifically,
NORMXCDF(X,U1,SD1,U2,SD2,PMIX)
NORMXPDF(X,U1,SD1,U2,SD2,PMIX)
NORMXPPF(P,U1,SD1,U2,SD2,PMIX)
where U1 and SD1 are the mean and standard deviation of the
first normal distribution, U2 and SD2 are the mean and standard
deviation of the second normal distribution, and PMIX is
the mixing proportion (between 0 and 1).
You can generate a probability plot as follows:
LET U1 = <value>
LET SD1 = <value>
LET U2 = <value>
LET SD2 = <value>
LET P = <value>
NORMAL MIXTURE PROBABILITY PLOT Y
You can generate random numbers as follows:
LET U1 = <value>
LET SD1 = <value>
LET U2 = <value>
LET SD2 = <value>
LET P = <value>
LET Y = NORMAL MIXTURE RANDOM NUMBERS FOR I = 1 1 1000
d) Added the inverted gamma probability functions:
IGACDF(X,GAMMA,LOC,SCALE)
IGAPDF(X,GAMMA,LOC,SCALE)
IGAPPF(P,GAMMA,LOC,SCALE)
This is not really a new function. It is simply the
generalized gamma function with the second shape parameter
set to -1. We added it as a separate set of functions since
it is a common distribution in certain applications.
Also added:
LET GAMMA = <value>
INVERSE GAMMA PROBABILITY PLOT
INVERSE GAMMA PPCC PLOT
e) Added following discrete PPCC PLOT commands:
BINOMIAL PPCC PLOT
NEGATIVE BINOMIAL PPCC PLOT
LOGARIOTHMIC SERIES PPCC PLOT
For the binonial and negative binomial, N must be specified
(and then P is computed).
f) Fixed the PROBABILITY PLOT X Y and PPCC PLOT X Y commands
to handle zero count bins correctly.
3) Recipe Updates
a) Added support for multi-factor recipe fits. For example,
a common model is:
Y = A0 + A1*X1 + A2*X1**2 + A3*X2 + A4*X2**2 + A5*X1*X2
In Dataplot, the recipe analysis could be done as follows:
READ FILE.DAT Y X1 X2 BATCH
READ FILE2.DAT XP1 XP2
LET X1S = X1*X1
LET X2S = X2*X2
LET X1X2 = X1*X2
LET XP1S = XP1*XP1
LET XP2S = XP2*XP2
LET XP1P2 = XP1*XP2
.
RECIPE FIT FACTORS 5
RECIPE FIT Y X1 X1S X2 X2S X1X2 BATCH XP1 XP1S XP2 XP2S XP1P2
PRINT TOL
XP1 and XP2 are the points at which you want the tolerance
values computed. If they are omitted, then the tolerance
values are computed at the unique points in the design
matrix (i.e., all the unique combinations of X1 and X2).
The BATCH variable is a batch identifier and is optional.
X1 and X2 must have the same number of points and XP1 and
XP2 should have the same number of points. However, X1 and
XP1 do not need to have the same number of points (and they
usually will not). The primary output from the RECIPE command
is the tolerance values (by default, saved in TOL). Commands
for setting the probability confidence and content are
the same as for the 1-factor recipe fit.
b) Recipe is generally used in the context of setting tolerance
limits as defined in MIL-17 Handbook. A number of other
statistical techniques are defined in this handbook.
Dataplot had previously added support for the Grubbs test,
Levene's test for shifts in scale, and the F test for shifts
in location. The following additional tests defined in the
handbook are now supported as well:
ANDERSON-DARLING <DIST> TEST Y
where DIST is: NORMAL, LOGNORMAL, WEIBULL, EXTREME VALUE
ANDERSON-DARLING K-SAMPLE TEST Y X
WEIBULL MAXIMUM LIKELIHOOD Y
B BASIS <DIST> TOLERANCE LIMIT Y
A BASIS <DIST> TOLERANCE LIMIT Y
where DIST is: NORMAL, LOGNORMAL, WEIBULL, NON-PARAMETRIC
The Anderson-Darling 1-sample test is used to determine if a
data set can be assumed to come from a certain distribution.
The EXTREME VALUE distribution is the type 1 extreme value
distribution. The k-sample Anderson-Darling test is used
to test if groups of data are the same (in the sense of
coming from the same distribution with common location and
scale). It is typically used to determine if data coming
different batches can be treated as if they came from the
same batch. The WEIBULL MAXIMUM LIKELIHOOD command is used
to generate maximum likelihood estimates of the 2-parameter
Weibull distribution (the shape and scale parameters).
The B BASIS and A BASIS commands are used to generate
b basis and a basis tolerance limits for a variable
for a few common distributions.
See the MIL-17 Handbook for more information on these
techniques.
4) Matrix Updates
Modified matrix commands to make more efficient use of
storage. Upped default maximum number of rows from 1,500 to
3,000.
Added a DIMENSION MATRIX COLUMNS <val> and DIMENSION MATRIX ROWS
<val> command. This is used to dimension temporary matrices
in the matrix routines. Note that unlike the DIMENSION command
for variables, this command does not erase any previously
created data. It is only used to dimension temporary matrices
in the matrix code, not to store the original data.
Each temporary matrix has a maximum of 920,000/3 elements.
However, you cannot dimension the number of rows in a matrix
to be greater than the number of rows in a variable.
5) Miscellaneous Updates
a) Added the commands:
LINE <SAVE/RESTORE>
CHARACTER <SAVE/RESTORE>
These were motivated by the graphical user interface, but they
can be used directly by the user as well.
b) Added the commands:
SET PRINTER <id>
PROBE PRINTER <id>
These allow the user to specify the printer name for the
PP command. It is currently supported for the Unix and
Windows 95/NT versions. It would be straightforward to support
on other systems as well.
c) The ANOVA code was significantly rewritten.
1) The maximum number of factors was increased from 5 to 10.
2) The output was modified. Specifically, an ANOVA table was
added other output was re-arranged.
3) Some information is now written out to files DPST1F.DAT
and DPST2F.DAT. This is usefule if you need to use some
of the ANOVA quantities in further analysis.
4) A check is now made to see if you have a balanced design
(i.e., all cells have an equal number of observations).
A warning message will be printed if an unbalanced case is
detected. Note that the Dataplot calculations are based on
the assumption of balanced data. However, it will still
run the ANOVA for the unbalanced case (the output will
not be accurate in this case).
d) Added CODED as synonym for CODE (LET Y = CODE X or
LET Y = CODED X).
e) Modified data reads so that non-printing characters are
converted to spaces.
f) The BOOTSTRAP PLOT command was augmented so that the following
parameters are now automatically saved:
BMEAN - mean of the plotted bootstrap values
BSD - standard deviation of the plotted bootstrap values
B001 - the 0.1% percentile of the plotted bootstrap values
B005 - the 0.5% percentile of the plotted bootstrap values
B01 - the 1.0% percentile of the plotted bootstrap values
B025 - the 2.5% percentile of the plotted bootstrap values
B05 - the 5.0% percentile of the plotted bootstrap values
B10 - the 10% percentile of the plotted bootstrap values
B20 - the 20% percentile of the plotted bootstrap values
B80 - the 80% percentile of the plotted bootstrap values
B90 - the 90% percentile of the plotted bootstrap values
B95 - the 95% percentile of the plotted bootstrap values
B975 - the 97.5% percentile of the plotted bootstrap values
B99 - the 99% percentile of the plotted bootstrap values
B995 - the 99.5% percentile of the plotted bootstrap values
B999 - the 99.9% percentile of the plotted bootstrap values
These values are typically used in setting confidence levels.
Also, the BOOTSTRAP COEFFICENT OF VARIATION PLOT and
BOOTSTRAP RELATIVE VARIANCE PLOT commands were added.
g) Some code not used by the user was added for the graphical
front-end.
h) Raised the maximum number of lines in a loop from 200 to 500.
i) Fixed some minor bugs.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT October - December 1997.
----------------------------------------------------------------------
1) The WRITE command was updated to allow
WRITE VARIABLES ALL (or WRITE ALL VARIABLES)
This was added to support some updates to the frontend, but it
can be used in the command line as well. Currently, a maximum
of 25 variables will be printed.
2) An update was made to allow exponential notation in commands
where a number or parameter is expected. For example,
LET Y = DATA 1.2E-7 2.0E3 4.26E+4
The above example shows the 3 forms of the E notation that
are currently recognized. Note that using "D" instead of
"E" is not currently supported.
Parsing of expressions (e.g., transformations under LET,
definition of functions, FIT expressions) is not yet supported.
That is,
LET Y(1) = 1.2E-3
does NOT work as of yet. The parsing of expresions under
LET is handled in a different part of the code. Support
may be added at a later time.
3) The command SKIP AUTOMATOC or SKIP ---- can be used to
skip all lines in a data file until the first line
containing a "----" string is found. It does not have to
start in column 1. This was added primarily to
to support the data files provided with Dataplot. However,
you can use this with your own data files as well.
If no line with "----" is found, Dataplot rewinds the file
and tries to read data starting with the first line of the
file.
This option only applies if the read is performed on a file.
If the read is from the terminal, SKIP AUTOMATIC is
equivalent to a SKIP 0.
4) The following 2 commands were added:
AUTOCOMOVEMENT PLOT Y
CROSS COMOVEMENT PLOT Y1 Y2
These are similar to the AUTOCORRELATION PLOT and the
CROSS CORRELATION PLOT commands. However, they are based
on the COMOVEMENT statistic rather than the correlation
statistic. At this time, no reference lines indicating
statistical significance are drawn.
5) The following special function was added:
LET A = PSIFN(X,K) - scaled k-th derivative of the PSI (or
DIGAMMA) function
Note that this computes a SCALED version of the function,
specifically
((-1)**(K+1)/GAMMA(K+1))*PSI(X,K)
where GAMMA is the gamma function and PSI(X,K) is the unscaled
function. Also, it is the k-th derivative of PSI, not of
the log gamma function. That is, K=1 computes the
trigamma function, not the digamma function.
6) The DELETE command was modified so that blanked out values
are reset to zero instead of machine negative infinity.
7) Added IF EXIST command. An IF NOT EXIST command was added
several years ago. This commands works as follows:
IF A EXIST
PRINT A
END OF IF
where A is a parameter. A will be printed if it already
exists.
8) Added the command REPLOT to regenerate the most recently
created plot. Although this was motivated by enhancements
to the graphical user interface, it can be useful in command
line mode as well.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT September 1997.
----------------------------------------------------------------------
1) Added a SLEEP <n> command to pause for <n> seconds. This is useful for
macros so plots can be displayed for a given period of time without
requiring user intervention to continue (as needed by the PAUSE command).
This command is platform dependent and is currently implemented for Unix
and Windows 95/NT versions.
Added a CD command to change the current directory. This command is
platform dependent and has currently been implemented for the
Windows 95/NT version. This command is particularly useful for the
Windows 95/NT version since when Dataplot is executed from a screen
icon, the default directory is the the directory where the Dataplot
executable resides. The SYSTEM command cannot be used to change the
current directory since a "SYSTEM CD <directory>" does not persist
after the SYSTEM command completes execution.
2) Added Mark Vangel's RECIPE code. RECIPE stands for "REgression Confidence
Intervals on PErcentiles". It is used to calculate basis values for
regression models with or without a random "batch effect".
A full discussion of RECIPE is beyond the scope of this brief news item.
Complete technical documentation for RECIPE is available at the following
Web site:
http://www.itl.nist.gov/div898/software/recipe/
This discusses RECIPE in general, not the Dataplot implementation.
The basic RECIPE commands are:
RECIPE FIT Y X BATCH XPRED - linear regression, polynomial models
RECIPE ANOVA Y X1 ... XK BATCH - ANOVA, multilinear models
The primary output from the RECIPE command is a set of tolerance values.
These are saved in the internal Dataplot variable TOL by default. This
variable can be plotted and manipulated like any other Dataplot variable.
The RECIPE documentation (on the above web site) also discusses a program
called SIMCOV. SIMCOV is used to determine whether or not Saitterthwaite
approximation is adequate in determing the tolerance values. SIMCOV
uses simulation to determine this. The following commands implement
the SIMCOV program in Dataplot.
RECIPE SIMCOV FIT Y X BATCH XPRED - linear regression, polynomial models
RECIPE SIMCOV ANOVA Y X1 ... XK BATCH - ANOVA, multilinear models
The following commands set switches for the RECIPE and SIMCOV analyses.
RECIPE FIT DEGREE <N> - polynomial degree for RECIPE FIT
RECIPE FACTORS <N> - number of factors for RECIPE ANOVA
RECIPE OUTPUT <VAR> - name of variable to contain computed
tolerance values
RECIPE SATTERTHWAITE <YES/NO> - specifies whether or not Satterthwaite
approximation is used
RECIPE PROBABILITY CONTENT <VAL> - value for probability content
RECIPE CONFIDENCE <VAL> - value for probability content
RECIPE CORRELATION <N> - the number of correlation values at
which to compute SIMCOV probabilities
RECIPE SIMCOV REPLICATES <N> - the number of replications for SIMCOV
RECIPE SIMPVT REPLICATES <N> - the number of replications for SIMPVT
(applies when Satterthwaite
approximation not used)
In addition, the following commands were added to support RECIPE
analyses (these techniques recommended by the MIL-HDBK-17E):
GRUBB TEST Y - performs the Grubb test for outliers
LEVENE TEST Y X - performs the Levene test for homogenuous variances
(similar, but more robust for non-normal distributions,
to Bartlett's test)
F LOCATION TEST Y X - performs an F test for homogenuous locations
These capabilities were originally implemented as the macros GRUBB.DP, LEVENE.DP,
and FTESTLOC.DP which have been added to the Dataplot macro directory.
In addition, four data sets (VANGEL31.DAT, VANGEL32.DAT, VANGEL33.DAT, and
VANGEL34.DAT) that can be analyzed with RECIPE were added to the Dataplot
data sets directory. Corresponding macros (VANGEL31.DP, VANGEL32.DP, VANGEL33.DP,
and VANGEL34.DP) were added to the Dataplot programs directory.
3) The following control charts were added:
EWMA CONTROL CHART Y - exponentially weighted moving average control chart
EWMA CONTROL CHART Y X - exponentially weighted moving average control chart
MOVING AVERAGE CONTROL CHART Y - moving average control chart
MOVING AVERAGE CONTROL CHART Y X - moving average control chart
MOVING RANGE CONTROL CHART Y - moving range control chart
MOVING RANGE CONTROL CHART Y X - moving range control chart
MOVING SD CONTROL CHART Y - moving standard deviation control chart
MOVING SD CONTROL CHART Y X - moving standard deviation control chart
These work in a similar fashion to previously available control charts.
An important feature of all control charts was omitted from previous
documentation (this feature has actually been available for quite some time).
Dataplot allows you to specify the target and lower and upper
control limits by entering the commands:
LET A = TARGET = <value> - the target value
LET A = USL <value> - the upper control limit
LET A = LSL <value> - the lower control limit
The data is drawn as trace 1, the target value and limits derived from the
data are drawn as traces 2, 3, and 4, and the user specified target and
control limits (if given) are drawn as traces 5, 6, and 7. You can control
which of these values are actually plotted by setting the LINE and CHARACTER
commands appropriately.
4) The REPEAT GRAPH, SAVE GRAPH, and LIST GRAPH commands that were previously
added for X11 installations have been extended to support the Microsoft
Windows 95/NT implementation. The commands work on Windows 95/NT as they
do for Unix. The primary difference is that the plots are saved in
Windows bitmap format. The Windows 95/NT still needs a little tidying up
(the default positioning isn't ideal yet), but it is functional.
5) The following special functions were added:
LET A = CGAMMA(XR,XC) - real component of complex gamma
LET A = CGAMMAI(XR,XC) - complex component of complex gamma
LET A = CLNGAM(XR,XC) - real component of complex log gamma
LET A = CLNGAMI(XR,XC) - complex component of complex log gamma
LET A = CBETA(AR,AC,BR,BC) - real component of complex beta
LET A = CBETAI(AR,AC,BR,BC) - complex component of complex beta
LET A = CLNBETA(AR,AC,BR,BC) - real component of complex beta
LET A = CLNBETAI(AR,AC,BR,BC) - complex component of complex beta
LET A = CPSI(XR,XC) - real component of complex psi
LET A = CPSII(XR,XC) - complex component of complex psi
LET A = CHM(X,A,B) - confluent hypergeometric M function
LET A = HYPERGEO(X,A,B,C) - hypergeometric function (for restricted values of X,
convergent case x < 1)
LET A = PBDV(X,A) - parabolic cylinder function (Dv)
LET A = PBDV1(X,A) - derivative of parabolic cylinder
function (Dv)
LET A = PBVV(X,A) - parabolic cylinder function (Vv)
LET A = PBVV1(X,A) - derivative of parabolic cylinder
function (Vv)
LET A = PBWA(X,A) - parabolic cylinder function (Wa) (only for X < 5)
LET A = PBWA1(X,A) - derivative of parabolic cylinder
function (Wa) (only for X < 5)
LET A = BER(XR) - Real component of Kelvin Ber function
LET A = BERI(XR) - Complex component of Kelvin Ber function
LET A = BER1(XR) - Real component of derivative of Kelvin Ber
function
LET A = BERI1(XR) - Complex component of derivative of Kelvin Ber
function
LET A = KER(XR) - Real component of Kelvin Ker function
LET A = KERI(XR) - Complex component of Kelvin Ker function
LET A = KER1(XR) - Real component of derivative of Kelvin Ker
function
LET A = KERI1(XR) - Complex component of derivative of Kelvin Ker
function
LET A = ZETA(S) - Riemann zeta function - 1 (s > 1)
LET A = ETA(S) - eta function - 1 (s >= 1)
LET A = CATLAN(S) - Catlan Beta function - 1 (s >= 1)
LET A = BINOMIAL(N,M) - Binomial coefficent of N and M
LET A = BINOM(N,M) - Binomial coefficent of N and M
LET A = EN(N) - Euler number of order N
LET A = EN(X,N) - Euler polynomial of order N
LET A = BN(N) - Bernoulli number of order N
LET A = BN(X,N) - Bernoulli polynomial of order N
LET A = BERNOULLI NUMBERS FOR I = 1 1 N - Bernoulli numbers
LET A = EULER NUMBERS FOR I = 1 1 N - Euler numbers
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT July 1997.
----------------------------------------------------------------------
1. Added support for printing tic mark labels in exponential
format for linear scales. Enter the command
...TIC MARK LABEL FORMAT EXPONENTIAL
The default is to write the number with an E15.7 format.
To control the number of decimal points, enter the command
...TIC MARK LABEL DECIMAL <n>
where <n> is a positive integer. For example, if
<n> is 4, the number is printed with an E12.4 format.
2) For the diagrammatic graphics commands that draw a figure
(AND, AMPLIFIER, ARC, ARROW, BOX, CAPACITOR, CIRCLE, DIAMOND,
CUBE, ELLIPSE, GROUND, HEXAGON, INDUCTOR, LATTICE, NOR, OR,
OVAL, PYRAMID, POINT, RESISTOR, SEMI-CIRCLE, TRIANGLE)
were updated to include a "DATA" option (similar to the
DRAWDATA and MOVEDATA commands). This "DATA" option draws the
plot in units of the most recent plot rather than 0 to 100
screen units. For example, ELLIPSE DATA <list of points>
draws the ellipse in units of the most recent plot.
Similar to the DATA option, there is a RELATIVE option in the
above commands. Although this capability has actually been
available in Dataplot for quite some time, it was left out
of the documentation for the diagrammatic graphics commands.
Relative drawing means that the first point is drawn in
absolute units and all subsequent points are relative to the
prior point. For example DRAW RELATIVE 10 10 2 3
would draw a line from (10,10) to (12,13).
The word "DATA" should come before the word "RELATIVE"
in these commands. There are actually 4 forms to these
commands. For example,
ELLIPSE X1 Y1 X2 Y2 X3 Y3
ELLIPSE DATA X1 Y1 X2 Y2 X3 Y3
ELLIPSE RELATIVE X1 Y1 X2 Y2 X3 Y3
ELLIPSE DATA RELATIVE X1 Y1 X2 Y2 X3 Y3
The first form draws in absolute screen 0 to 100 units,
the second form draws in absolute units of the most recent plot,
the third form draws in relative screen 0 to 100 units, and
the fourth form draws in relative units of the most recent plot.
3) POLYGON was added to the list of diagrammatic commands. This
command takes the following form:
POLYGON X Y <SUBSET/EXCEPT/FOR qualification>
POLYGON DATA X Y <SUBSET/EXCEPT/FOR qualification>
POLYGON RELATIVE X Y <SUBSET/EXCEPT/FOR qualification>
POLYGON RELATIVE DATA X Y <SUBSET/EXCEPT/FOR qualification>
The first form plots the polygon in 0 to 100 screen units while
the second form plots the data in units of the most recent plot.
The third and fourth forms are similar, but they use relative
coordinates (the first coordiante pair is in absolute units,
the remaining are coordinates relative to the previous point).
Note that X and Y are arrays, not lists of points as used by
the other diagrammatic graphics commands. Since these are
arrays, the SUBSET, EXCEPT, and FOR qualifications can be
applied to the list of points, although this is not common
in the context of this command.
Setting the last point to the first point (i.e., closing the
polygon) is not required since Dataplot does this automatically.
As with the other diagrammatic graphics commands, the attributes
of the border of the polygon are set via the first setting
of the LINE commands (e.g., LINE DASH, LINE COLOR BLUE, LINE
THICKNESS 0.3). The attributes of the interioir of the polygon
are set with the various REGION attribute commands (e.g.,
REGION FILL ON, REGION FILL COLOR BLUE).
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT January-April 1997.
----------------------------------------------------------------------
1. A check is now performed to determine if DPPL2F.DAT is opened
successfully upon starting Dataplot. If not, an error message is
printed and Dataplot is terminated. The typical cause for this
is trying to run Dataplot in a read only directory. This change
provides a more graceful exit.
2. The Dataplot Reference Manual is now available on-line. The
Dataplot home page can be accessed from a Web browser using
the URL:
http://www.itl.nist.gov/div898/software/dataplot/homepage.html
The Reference Manual is under the "documentation" table entry.
The following should be noted:
a) In order for these commands to work, you need to have
a web browser available on your system.
The Dataplot web pages display correctly with the Netscape,
Internet Explorer, and HotJava 1.1 browsers. They do not
display correctly with the HotJava 1.0, Mosaic, or character
oriented browsers. We do not have access to other browsers,
so we can make no specific comment on them.
b) The Reference Manual is in PDF format (Portable Document
Format), so it requires a PDF viewer. Typically, this is the
Adobe Acrobat Reader. This reader is supported on most common
platforms and can be downloaded for free. The PC installation
typically takes about 10-15 minites to download and install.
For best performance, it is strongly recommended that the
Adobe Acrobat reader be installed as a plug-in (this is
done automatically for Netscape on the PC) rather than
as a helper application. The documentation web page contains
a link to the Adobe Acrobat web site for downloading the
reader.
In addition, several commands are now available for accessing
the Web, and the Dataplot Web pages and Reference Manual in
particular, from within Dataplot.
The first command is:
WEB
WEB NIST/SIMA/HPPC/SED/DATAPLOT
WEB <url address>
By default, this command activates Netscape with the specified
URL. If no URL is given, the NIST home page is used. Several
keywords are recognized. For example, SED activates the
NIST Statistical Engineering Division home page.
The second command is:
WEB HELP <string>
This command is similar to the standard Dataplot HELP command.
However, it accesses the on-line Reference Manual rather than
the ASCII text help files. <string> will usually be a Dataplot
command (e.g., WEB HELP FIT, WEB HELP PLOT). However, many
special keywords are also recognized. For example, WEB HELP or
WEB HELP DATAPLOT access the Dataplot home page. Enter the
command:
LIST REFMAN.TEX
to see a list of recognized keywords (the upper case entries in
columns 1-40 identify the keywords while columns 40+ identify the
associated URL).
The WEB and WEB HELP commands are supported for Unix platforms
and for the Windows 95/NT version.
A few SET commands were added to support the WEB and WEB HELP
commands.
a) By default, Dataplot tries to use the Netscape browser. On
Unix, it tries to do this by entering the command "netscape".
On Windows 95/NT, it enters
"C:\Program Files\NETSCAPE\NAVIGATOR\PROGRAM\netscape.exe"
If you wish to use a different browser, or if Netscape is
installed in a different location, you can enter the
following command:
SET BROWSER <file name>
where <file name> is the string that activates your preferred
browser. In particular, if you prefer to use the Internet
Explorer under Windows 95/NT, you can enter:
SET BROWSER "C:\Program Files\Plus!\Microsoft Internet\iexplore.exe"
The enclosing quotes are required because the file name contains
spaces. Again, check to see if this is the proper path on
your system.
Alternatively, you can enter the Unix command
setenv BROWSER <file name>
or the Windows 95/NT command
SET BROWSER=<file name>
to set the browser. These are typically placed in your
start-up files (.login or .cshrc for Unix, AUTOEXEC.BAT for
Windows 95/NT). You can shorten the browser name if you add
the correct directory to your path.
b) For the WEB command, the default URL is the NIST home page.
You can change the default with the following Dataplot command:
SET URL <default URL>
For the WEB HELP command, the default URL is the Dataplot
home page on the public NIST web server. This can be
changed (for example, if you have installed the Dataplot
web pages and Reference Manual on a local site) by entering
the command:
SET DATAPLOT URL <location of Dataplot web pages>
Alternatively, you can enter the Unix commands
setenv URL <location of default URL>
setenv DPURL <location of Dataplot web pages>
or the Windows 95/NT commands
SET URL=<location of default URL>
SET DPURL=<location of Dataplot web pages>
For Unix platforms, the following command was added to tell
Dataplot to use a currently open NETSCAPE window (this command
is not needed for the PC):
SET NETSCAPE <OLD/NEW>
These commands have been tested with NETSCAPE on Unix and
with Netscape and the Internet Explorer on the PC.
One important difference between the Unix and PC versions of
these commands should be noted. Under Unix, once the WEB command
is initiated, control returns to Dataplot after the browser is
started. You can independently navigate in the the browser and
enter additional Dataplot commands. However, on the PC, control
does not return to Dataplot until you exit the browser.
3. The following commands were added to allow previously viewed
graphs to be saved for later recall. The primary purpose is
to allow comparisons of a previous graph to a current graph.
These commands are currently only supported for the X11 graphics
device (available on most Unix implementations).
SAVE PLOT <file> (or SAVE GRAPH, SP, SG)
SAVE PLOT <file> AUTOMATIC
SAVE PLOT AUTOMATIC
REPEAT PLOT <file> (or REPEAT GRAPH, RP, RG, VIEW PLOT,
VIEW GRAPH, VG, VP)
REPEAT PLOT <+n>
REPEAT PLOT <-n>
LIST PLOT (or LIST GRAPH, LP, LG)
CYCLE PLOT (or CYCLE GRAPH, CG, CP)
PIXMAP TITLE <title>
As a technical note, the plots are saved in X11 "bitmap" format.
This is distinct from the X11 image format that is used by
xwd to save a screen image. This choice was made for performance
reasons (xlib provides direct routines for reading and writing
bitmaps, but not for reading and writing images). The primary
limitations are:
i) Color is not supported for X11 bitmaps. Elements drawn
in color will not be saved in the bitmap.
ii) You cannot use the X11 tools xwd and xwud to view the
saved plots independently of Dataplot. However, they
can be viewed by any software the reads X11 bitmaps.
The saved plots are essentially screen dumps. There is
currently no "linking" in the sense that if a given variable
is changed the saved plots are automatically updated.
The SAVE GRAPH command saves the current plot in the user
specified file. If no file name is specified, then the file
name "pixmap.<n>", where <n> is a counter, is used.
The keyword AUTOMATIC tells Dataplot to automatically save all
subsequent plots. With the AUTOMATIC option, Dataplot does not
save the current graph until the next plot is generated. This is
done in order to correctly handle multi-plots and diagrammatic
graphics. That is, the current graph is saved whenever a screen
erase is performed. If a filename is provided, this will be used
as the base (the ".<n>" is added). For example,
SAVE PLOT HISTOGRAMS AUTOMATIC saves subsequent plots in
the files HISTOGRAMS.1, HISTOGRAMS.2, and so on. Enter SAVE GRAPH
AUTOMATIC OFF to terminate the automatic saving of the plots.
The REPEAT PLOT command reads a saved plot and draws it in a
window that is distinct from the normal Dataplot X11 graphics
window. If no file is specified, or if <n> is 0 for REPEAT
PLOT, the most current saved plot is drawn. A <+n> takes the
Nth plot from the current list. A <-n> takes the "current - n"th
plot from the current plot list. The DEVICE 1 X11 command
must be entered before the REPEAT PLOT command can be used.
The REPEAT PLOT command can redraw plots that were created in
a previous Dataplot session. In fact, it will successfully
redraw any file that is in the X11 bitmap format (but not in
xwd format).
The LIST PLOT command lists the currently saved plots (by
sequence number, file name, and title). It only lists plots
saved in the current session. However, this includes graphs
created in a previous Dataplot session that have been redrawn
with the REPEAT GRAPH command. Dataplot does not maintain a
database of previously saved plots.
The CYCLE PLOT command allows you to cycle through the pixmaps
in the current list by clicking mouse buttons. Clicking the
left mouse button moves down in the current list, clicking the
right mouse button moves up in the current list, and clicking
the middle mouse button returns control to Dataplot. At least
one REPEAT PLOT command should be entered before using this
command.
The PIXMAP TITLE command allows you to specify the title for
a saved plot. This title is simply for convenience in listing
the saved plots. It is not saved as part of the file and the
title only applies to the current Dataplot session. The default
title is the file name.
The pixmap title applies to the current plot when the SAVE GRAPH
command is entered. It does not matter whether the PLOT or
PIXMAP TITLE command is entered first.
Be aware that for SAVE GRAPH AUTOMATIC the saving for a given
plot is not executed until the next screen erase (typically the
next plot) is encountered to allow for multi-plotting and the
addition of diagrammatic graphics to a plot. The order of
the commands would typically be something like:
SAVE GRAPH AUTOMATIC
4-PLOT Y
PIXMAP TITLE 4-PLOT
PLOT Y
PIXMAP TITLE PLOT Y
HISTOGRAM Y
PIXMAP TITLE HISTOGRAM
The main point here is that the PIXMAP TITLE comes AFTER the
plot command.
Unlike the regular TITLE command, the PIXMAP TITLE command does
not persist. That is, it applies only to the next saved plot and
then reverts to the default of using the file name.
4. Added following special functions:
a) LAMBDA(X,V) - Lambda function (V can be integer or real)
b) LAMBDAP(X,V) - derivative of Lambda function (V can be integer
or real)
c) H0(X) - Struve function order 0
d) H1(X) - Struve function order 1
e) HV(X,V) - Struve function order V
f) L0(X) - modified Struve function order 0
g) L1(X) - modified Struve function order 1
h) LV(X,V) - modified Struve function order V
i) Added LOGBETA as synonym for LNBETA and LNGAMMA as synonym for
LOGGAMMA.
5. The following bug fixes were made:
a) Fixed bug where TEXT command automatically generated a software
font (introduced by the DEVICE FONT command).
b) Fixed bug in the ANOVA command.
c) Fixed bug with ERASE command on Windows NT version.
d) Fixed bug in HELP with conflict between STATUS and
STATISTIC PLOT.
e) Fixed bug if software font used and CHARACTER BLANK was
entered in lower case.
f) Fixed bug where CREATE <file> went into an infinite loop if
a CALL command was encountered. The CALL command will now
be saved correctly in the CREATE file. Note that the commands
in the CALL file are not saved in the CREATE file (they are
already saved as part of the CALL macro file).
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT October-November 1996.
----------------------------------------------------------------------
1. A native mode Windows 95/NT version is now available. This
version was created using the Microsoft Windows 95 compiler.
The initial release supports the command line version only.
We will attempt over the next several months to port the
Tcl/Tk based graphical user interface to the Windows 95/NT
environment.
To generate graphics to the screen for this version, enter
the following command:
DEVICE 1 QWIN
Enter the command HELP QWIN for details of using this device.
2. For encapsulated Postscript files, DATAPLOT based the bounding
box parameters assuming an 11 x 11 inch page. This was done to
accomodate both landscape and portrait orientation plots.
Unfortunately, this did not generate satisfactory results when
importing DATAPLOT graphics into WordPerfect and other text
processing software. The user had to do a fair amount of manual
rotation and scaling of plots.
DATAPLOT now adjusts the bounding box depending on the orientation.
It uses 11 x 8.5 inch for landscape orientation and 8.5 x 11 inch
for portrait. However, most text processors ignore the rotation
and translation that the landscape plots request. To compensate
for this, the following command was added:
ORIENTATION LANDSCAPE WORDPERFECT
This essentially generates a landscape orientation on a portrait
page. That is, the bounding box specifies an 8.5 x 6.5 inch page.
This generates execellent results with Word Perfect (users should
normally never need to adjust the bounding box parameters or
perform manual rotation and translation in Word Perfect).
This option is only recognized for encapsulated Postscript.
Regular Postscript should still use ORIENTATION LANDSCAPE.
3. Fixed a few bugs:
a. Macros now accept more than 1,000 lines.
b. Unix executables were not finding certain auxillary files
if the file names were entered in lower case.
c. NORMAL PLOT fixed.
4. The output for the YATES command was modified to be more readable
and informative.
----------------------------------------------------------------------
The following enhancement was made to DATAPLOT July 1996.
----------------------------------------------------------------------
1. The previous fix (checking the HOME environment variable for the
user's root directory) was refined a bit. If HOME is defined,
it looks for dplogf.tex in that directory. If dplogf.tex is
not found, instead of printing an error message, it then strips
off the path name and looks for it in the current directory and
then in the DATAPLOT directory (typically /usr/local/lib/dataplot).
Note that if an error message is printed saying that this file is
not found, DATAPLOT will still run. This file simply lets you
enter some DATAPLOT commands when starting DATAPLOT (i.e., for
setting your preferred defaults). There should not be any
negative side effects if this file is not executed.
2. Unix versions will check for the environment variable
DATAPLOT_WEB. If this variable is defined, DATAPLOT assumes it
is being run from the web (e.g., from Mosaic or Netscape).
Currently, the only effect is that certain files that DATAPLOT
typically creates in the current directory, such as dppl1f.dat
and dpconf.tex, are opened in the /tmp directory. This may or
may not be expanded upon as we gain more experience running
DATAPLOT from web servers.
3. We built a "double precision" version for the Sun. That is,
the -p8 option was used so that single precision numbers are
64-bit rather than 32-bit. The only complication was in how the
X11 routines were called (these are compiled with 32-bit real
numbers). Changes were made to the X11 driver to allow a
"compile flag" to be set based on which case (i.e., 32 or 64-bit)
is desired. This means that DATAPLOT can be easily built on any
Unix system that supports the "-p8" option (or a compiler switch
that provides a similar capability).
4. A version of DATAPLOT was built using the LAHEY compiler
(previously, the OTG compiler was used). This version allows
DATAPLOT to be run on PC's without special AUTOEXEC.BAT and
CONFIG.SYS files (and therefore no rebooting to run DATAPLOT).
A device driver that uses the LAHEY graphics library is also
available. Enter
DEVICE 1 LAHEY
DEVICE 1 FONT SIMPLEX (this described below)
5. The following command was added:
DEVICE <1/2/3> FONT <font name>
This allows the screen device to use a different font than the
printed output. This was specifically motivated for the LAHEY
device driver. This driver does a very poor job with hardware
characters. Using a software font avoids this problem, but
often hardware characters are desired for the printed Postscript
output (to take advantage of the typset quality fonts available
with Postscript). Using the DEVICE 1 FONT SIMPLEX allows us
to get decent characters on the screen and still retain the
ability to use the Postscript fonts. Although this command
was motivated by the LAHEY device, it is also useful for other
screen devices (e.g., X11 hardware fonts are a fixed size, so
only 1 character size is available at a time, Tektronix devices
are limited to 4 discrete sizes, etc.).
6. Previously, log scales required at least 1 full cycle (e.g.,
10 to 100). It is now possible to get around this limitation.
For example, to have a log scale go from 85 to 125, do the
following:
YLOG ON
YLIMITS 100 100
YTIC OFFSET 15 25
PLOT Y
The key is that the lower and upper bound on the LIMITS command
must be the same and at least one of the TIC OFFSETS must be
greater than zero. Major TICS will be generated at this bound
and also at the frame limits. Minor tics will be plotted
where appropriate. Also, the TIC OFFSET is always interpreted
in data units for this case (i.e., can't specify the offset
in DATAPLOT 0 to 100 coordinates as you normally can).
7. Several bugs were fixed.
----------------------------------------------------------------------
The following enhancement was made to DATAPLOT June 1996.
----------------------------------------------------------------------
For Unix systems, check for the HOME environment variable. This
normally specifies the user's home directory. If present, DATAPLOT
looks for the user's start-up file (dplogf.tex) in the user's home
directory rather than the current directory. This means you no
longer have to include the start-up file in each directory from
which you run DATAPLOT. If HOME is not found, look for dplogf.tex
in the current directory . Note that if HOME is found and dplogf.tex
is not found in the home directory, DATAPLOT will NOT look for
it in the current directory.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT MAY, 1996.
----------------------------------------------------------------------
1) Fixed a bug where the X11 driver bombed if being run remotely
and the SET X11 PIXMAP ON command was used.
2) Fixed a bug where the 3D-PLOT was bombing when a large number of
points were plotted.
----------------------------------------------------------------------
The following enhancements were made to DATAPLOT FEBRUARY-APRIL, 1996.
----------------------------------------------------------------------
1) The following probability functions were added:
LET A = BBNCDF(X,ALPHA,BETA,N) - beta-binomial cumulative
distribution function
LET A = BBNPDF(X,ALPHA,BETA,N) - beta-binomial probability
density function
LET A = BBNPPF(P,ALPHA,BETA,N) - beta-binomial percent point
function
LET A = BRACDF(X,BETA) - Bradford cumulative distribution
function
LET A = BRAPDF(X,BETA) - Bradford probability density function
LET A = BRAPPF(P,BETA) - Bradford percent point function
LET A = DGACDF(X,GAMMA) - double gamma cumulative distribution
function
LET A = DGAPDF(X,GAMMA) - double gamma probability density
function
LET A = DGAPPF(P,GAMMA) - double gamma percent point function
LET A = FCACDF(X,U,SD) - folded Cauchy cumulative distribution
function
LET A = FCAPDF(X,U,SD) - folded Cauchy probability density
function
LET A = FCAPPF(P,U,SD) - folded Cauchy percent point function
LET A = GEXCDF(X,LAM1,LAM2,S) - generalized exponential
cumulative distribution function
LET A = GEXPDF(X,LAM1,LAM2,S) - generalized exponential
probability density function
LET A = GEXPPF(P,LAM1,LAM2,S) - generalized exponential
percent point function
LET A = GLOCDF(X,ALPHA) - generalized logistic cumulative
distribution function
LET A = GLOPDF(X,ALPHA) - generalized logistic probability
density function
LET A = GLOPPF(P,ALPHA) - generalized logistic percent point
function
LET A = KAPCDF(X,AK,B,T) - Mielke's beta-kappa cumulative
distribution function
LET A = KAPPDF(X,AK,B,T) - Mielke's beta-kappa probability
density function
LET A = KAPPPF(P,AK,B,T) - Mielke's beta-kappa percent point
function
LET A = NCCPDF(X,V,DELTA) - non-central chi-square probability
density function
LET A = PEXCDF(X,ALPHA,BETA) - exponential power cumulative
distribution function
LET A = PEXPDF(X,ALPHA,BETA) - exponential power probability
density function
LET A = PEXPPF(P,ALPHA,BETA) - exponential power percent point
function
The following probability plots were added:
LET ALPHA = <value>
LET BETA = <value>
LET N = <value>
BETA BINOMIAL PROBABILITY PLOT Y
LET BETA = <value>
BRADFORD PROBABILITY PLOT Y
LET GAMMA = <value>
DOUBLE GAMMA PROBABILITY PLOT Y
LET M = <value>
LET SD = <value>
FOLDED CAUCHY PROBABILITY PLOT Y
LET LAMBDA1 = <value>
LET LAMBDA2 = <value>
LET S = <value>
GENERALIZED EXPONENTIAL PROBABILITY PLOT Y
LET ALPHA = <value>
GENERALIZED LOGISTIC PROBABILITY PLOT Y
LET BETA = <value>
LET THETA = <value>
LET K = <value>
MIELKE BETA-KAPPA PROBABILITY PLOT Y
LET ALPHA = <value>
LET BETA = <value>
EXPONENTIAL POWER PROBABILITY PLOT Y
The following probability plot correlation coefficient plots
were added:
BRADFORD PPCC PLOT Y
DOUBLE GAMMA PPCC PLOT Y
GENERALIZED LOGISTIC PPCC PLOT Y
2) The WRITE command was updated to handle a maximum of 25 variables
(up from 10).
Support was added for writing Fortran unformatted data files.
This was done primarily for sites that have created "mega" size
versions of DATAPLOT where the time entailed in reading and writing
large data files becomes important. For standard size DATAPLOT
(typically a maximum of 10,000 rows with 10 columns for 100,000
data points total), the use of the SET READ FORMAT and SET WRITE
FORMAT commands provides adequate performance. However, the
unformatted read and write capability is available regardless of
the workspace size. The advantage of unformatted read and writes
is that the data files are much smaller (typically by a factor of
10 or more) and reading and writing the data significantly faster.
The disadvantage is that unformatted files are binary, and thus
cannot be modified or viewed with a standard text editor. Also,
Fortran unformatted files are NOT transportable across different
computer systems. Also, unformatted Fortran files are NOT
equivalent to C language byte stream files (these types of files
are not currently supported in DATAPLOT).
An unformatted write is accomplished by entering the command:
SET WRITE FORMAT UNFORMATTED
and then entering a standard WRITE command. For example,
WRITE LARGE.DAT X1 X2 X3
There are 2 ways to create the unformatted file in Fortran. For
example, suppose X and Y are to be written to an unformatted
file. The WRITE can be generated by:
a) WRITE(IUNIT) (X(I),Y(I),I=1,N)
b) WRITE(IUNIT) X,Y
The distinction is that (a) stores the data as X(1), Y(1),
X(2), Y(2), ..., X(N), Y(N) while (b) stores all of X then
all of Y. There is no inherent advantage in either method in
terms of performance or file size. The SET WRITE FORMAT
UNFORMATTED command only supports (a).
Unformatted writing is supported only for variables or matrices
(i.e., not for parameters or strings).
Be aware that Fortran unformatted files are NOT transportable
across systems. This is due to the fact that the file contains
various header bytes (the Fortran standard leaves implementation
of this up to vendor) that are not standard. Also, the storage
of real numbers can vary between platforms. This means that
the SET WRITE FORMAT UNFORMATTED command can NOT be used to write
raw binary files (as might be produced by a C program) and it
cannot, in general, be used to write unformatted Fortran files
that can be read on systems other than the one you are running
DATAPLOT on.
3) The command SET RELATIVE HISTOGRAM <AREA/PERCENT> was added to
specify whether or not relative histograms (and relative
bi-histograms) are drawn so that the area under the histogram
sums to 1 or so that the heights of the histograms sum to 1.
The first option, which is the default, is useful when using the
relative histogram as an estimate of a probability distribution.
The second option is useful when you want to see what percentage
of the data falls in a given class.
4) For Unix versions, the location of the DATAPLOT auxillary files
can be specified with the following Unix command:
setenv DATAPLOT_FILES <directory name>
This can be useful if you do not have super user permission to
copy the files into the /usr/local/lib/dataplot directory and
you do not have a cooperative system adminstrator.
5) The LET STRING command was modified so that the case of the
text in the string is preserved as entered. Note that the
LET FUNCTION command still converts text to upper case.
The READ STRING command was modified so that it ignores the
SET READ FORMAT command.
6) Numerous minor bugs were fixed.
-----------------------------------------------------------------
The following enhancements were made to DATAPLOT AUGUST-OCTOBER, 1995.
-----------------------------------------------------------------
1) The Numerical Recipes routine for calculating complex roots
was replaced with a CMLIB routine. There is no change in the
command syntax.
2) The Numerical Recipes routine for calculating the fast Fourier
transform was replaced with CMLIB routines. A couple of changes
were made as follows:
a) the CMLIB routine does not require zero padding so that
the length of the variable is a power of two. Previously,
DATAPLOT did this automatically. It no longer does. However,
the CMLIB algorithm loses efficiency if the length is not a
factor of small primes. In this case, you may wish to zero
pad the variable yourself before calling the FFT command.
b) The SET FOURIER EXPONENT <+/-> command was corrected to work
as intended (the default implemented the + case, which was really
the only option that worked). In addition, this command was
extended to apply to the FOURIER and INVERSE FOURIER command
as well as the FFT and INVERSE FFT commands. Enter
HELP FOURIER EXPONENT for more information on this command.
c) Most FFT routines return the data in the following order:
F(1) = zero frequency
F(2) ... F(N/2) = smallest positive frequency to largest
positive frequency
F(N/2+1) = aliased point that contains the largest
positive and the largest negative frequency
F(N/2+2) ... F(N) = negative frequencies from largest
magnitude to smallest magnitude
By default, DATAPLOT returns the data in the following order:
F(1) = aliased point that contains the largest
positive and the largest negative frequency
F(2) ... F(N/2) = Largest positive frequency to smallest
positive frequency
F(N/2+1) = zero frequency
F(N/2+2) ... F(N) = negative frequencies from smallest
magnitude to largest magnitude
The command SET FOURIER ORDER <STANDARD/DATAPLOT> was
implemented to allow you to specify which order to use.
The option STANDARD returns the first order while the option
DATAPLOT returns the second order.
3) Support was added for hypergeometric, non-central chi-square,
singly and doubly non-central F, half-cauchy and folded normal
random numbers,
The following probability functions were added:
LET A = ANGCDF(X) - anglit cumulative distribution function
LET A = ANGPDF(X) - anglit density function
LET A = ANGPPF(X) - anglit percent point function
LET A = ARSCDF(X) - arcsin cumulative distribution function
LET A = ARSPDF(X) - arcsin density function
LET A = ARSPPF(X) - arcsin percent point function
LET A = DWECDF(X,G) - double Weibull cumulative distribution
function
LET A = DWEPDF(X,G) - double Weibull density function
LET A = DWEPPF(X,G) - double Weibull percent point function
LET A = EWECDF(X,G) - exponentiated Weibull cumulative
distribution function
LET A = EWEPDF(X,G) - exponentiated Weibull density function
LET A = EWEPPF(X,G) - exponentiated Weibull percent point function
LET A = FNRCDF(X,U,SD) - folded normal cumulative distribution
function
LET A = FNRPDF(X,U,SD) - folded normal probability density
function
LET A = FNRPPF(X,U,SD) - folded normal percent point function
LET A = GEVCDF(X,G) - generalized extreme value cumulative
distribution function
LET A = GEVPDF(X,G) - generalized extreme value density function
LET A = GEVPPF(X,G) - generalized extreme value percent point
function
LET A = GOMCDF(X,C,B) - Gompertz cumulative distribution function
LET A = GOMPDF(X,C,B) - Gompertz probability density function
LET A = GOMPPF(X,C,B) - Gompertz percent point function
LET A = HFCCDF(X) - half-Cauchy cumulative distribution function
LET A = HFCPDF(X) - half-Cauchy density function
LET A = HFCPPF(X) - half-Cauchy percent point function
LET A = HFLCDF(X,G) - generalized half-logistic cumulative
distribution function
LET A = HFLPDF(X,G) - generalized half-logistic density function
LET A = HFLPPF(X,G) - generalized half-logistic percent point
function
LET A = HSECDF(X) - hyperbolic secant cumulative distribution
function
LET A = HSEPDF(X) - hyperbolic secant density function
LET A = HSEPPF(X) - hyperbolic secant percent point function
LET A = LGACDF(X,G) - log-gamma cumulative distribution function
LET A = LGAPDF(X,G) - log-gamma density function
LET A = LGAPPF(X,G) - log-gamma percent point function
LET A = PA2CDF(X,G) - Pareto type 2 cumulative distribution
function
LET A = PA2PDF(X,G) - Pareto type 2 density function
LET A = PA2PPF(X,G) - Pareto type 2 percent point function
LET A = TNRCDF(X,A,B,U,SD) - truncated normal cumulative
distribution function
LET A = TNRPDF(X,A,B,U,SD) - truncated normal probability density
function
LET A = TNRPPF(X,A,B,U,SD) - truncated normal percent point
function
LET A = TNECDF(X,X0,U,SD) - truncated exponential cumulative
distribution function
LET A = TNEPDF(X,X0,U,SD) - truncated exponential probability
density function
LET A = TNEPPF(X,X0,U,SD) - truncated exponential percent point
function
LET A = WCACDF(X,G) - wrapped-up Cauchy cumulative distribution
function
LET A = WCAPDF(X,G) - wrapped-up Cauchy density function
LET A = WCAPPF(X,G) - wrapped-up Cauchy percent point function
The following probability plots were added:
ANGLIT PROBABILITY PLOT Y
ARCSIN PROBABILITY PLOT Y
HYPERBOLIC SECANT PROBABILITY PLOT Y
HALF CAUCHY PROBABILITY PLOT Y
LET M = <value>
LET SD = <value>
FOLDED NORMAL PROBABILITY PLOT Y
LET A = <value>
LET B = <value>
LET M = <value> (optional, defaults to 0)
LET SD = <value> (optional, defaults to 1)
TRUNCATED NORMAL PROBABILITY PLOT Y
LET X0 = <value>
LET M = <value> (optional, defaults to 0)
LET SD = <value> (optional, defaults to 1)
TRUNCATED EXPONENTIAL PROBABILITY PLOT Y
LET GAMMA = <value>
DOUBLE WEIBULL PROBABILITY PLOT Y
LOG GAMMA PROBABILITY PLOT Y
GENERALIZED EXTREME VALUE PROBABILITY PLOT Y (or GEV PROB PLOT)
PARETO SECOND KIND PROBABILITY PLOT Y (or PARETO TYPE 2)
HALF LOGISTIC PROBABILITY PLOT Y (GAMMA optional for this case)
LET GAMMA = <value>
LET THETA = <value>
EXPONENTIATED WEIBULL PROBABILITY PLOT Y
LET C = <value>
LET B = <value>
GOMPERTZ PROBABILITY PLOT Y
LET C = <value>
WRAPPED CAUCHY PROBABILITY PLOT Y
The following probability plot correlation coefficient plots were
added:
LOG GAMMA PPCC PLOT Y
DOUBLE WEIBULL PPCC PLOT Y
GENERALIZED EXTREME VALUE PPCC PLOT Y (or GEV PPCC PLOT)
PARTEO SECOND KIND PPCC PLOT Y (or PARETO TYPPE 2 PPCC PLOT)
WRAPPED CAUCHY PPCC PLOT Y
HALF LOGISTIC PPCC PLOT Y
4) The following character option was added:
CHARACTER PIXEL
This option plots a single "pixel" on a given device. In addition,
when this option is given, the CHARACTER SIZE is interpreted as
an integer expansion factor. For example, CHARACTER SIZE 10 will
plot a 10x10 pixel block.
This option has been implemented for the Tektronix, X11,
Postscript, HP-GL, Regis, HP-2622, and Sun devices. Other devices
will print a message saying this option is unavailable (although
additional devices will be added later).
Although this capability was added with some possible future
enhancements in mind, it can be useful in some plots such as
fractal plots.
-----------------------------------------------------------------
The following enhancements were made to DATAPLOT JULY, 1995.
-----------------------------------------------------------------
Support was added for various types of orthogonal polynomials.
The following commands were added.
LET A = LEGENDRE(X,N) Compute the Legendre polynomial of
order n
LET A = LEGENDRE(X,N,M) Compute the associated Legendre
polynomial of order n and degree m
LET A = NRMLEG(X,N) Compute the normalized Legendre
polynomial of order n
LET A = NRMLEG(X,N,M) Compute the associated normalized
Legendre polynomial of order n and
degree m
LET A = LEGP(X,N) Compute the Legendre function of the
first kind of order n
LET A = LEGP(X,N,M) Compute the associated Legendre function
of the first kind of order n and degree m
LET A = LEGQ(X,N) Compute the Legendre function of the
second kind of order n
LET A = LEGQ(X,N,M) Compute the associated Legendre function
of the second kind of order n and
degree m
LET A = SPHRHRMR(X,P,N,M) Compute the real component of the
spherical harmonic function
LET A = SPHRHRMC(X,P,N,M) Compute the complex component of the
spherical harmonic function
LET A = LAGUERRE(X,N) Compoute the Laguerre polynomial of
order n
LET A = LAGUERRL(X,N,A) Compute the generalized Laguerre
polynomial of order n
LET A = NRMLAG(X,N) Compute the normalized Laguerre
polynomial of order n
LET A = CHEBT(X,N) Compute the Chebyshev T (first kind)
polynomial of order n
LET A = CHEBU(X,N) Compute the Chebyshev U (second kind)
polynomial of order n
LET A = JACOBIP(X,N,A,B) Compute the Jacobi polynomial of order n
LET A = ULTRASPH(X,N,A) Compute the Ultraspherical (or
Gegenbauer) polynomial of order n
LET A = HERMITE(X,N) Compute the Hermite polynomial of order n
LET A = LNHERMIT(X,N) Compute the log of the absolute value of
the Hermite polynomial of order n
LET A = HERMSGN(X,N) Compute the sign of the Hermite
polynomial (1 for positive, -1 for
negative, 0 for zero)
In addition, an alpha version of a graphical user interface is
available on some Unix systems. You can check with your local site
installer to see if it is available on your system. If it is
available, it is typically executed by entering the command:
xdp
At NIST, the frontend has been installed on the CAML Sun's and
SGI's as well as the Convex. There are no plans to install it
on the Cray. For non-NIST sites, the following non-DATAPLOT programs
must be installed:
1) Tcl/TK - Tool Commmand Language
2) Expect - a program for controlling the dialog among
interactive programs.
These are both popular public domain Unix utilities that can be
installed on most common Unix platforms.
-----------------------------------------------------------------
The following enhancements were made to DATAPLOT APRIL, 1995.
-----------------------------------------------------------------
1) Support was added for reading Fortran unformatted data files.
This was done primarily for sites that have created "mega" size
versions of DATAPLOT where the time entailed in reading large
data files becomes important. For standard size DATAPLOT
(typically a maximum of 10,000 rows with 10 columns for 100,000
data points total), the use of the SET READ FORMAT command
provides adequate performance. However, the unformatted read
capability is available regardless of the workspace size. The
advantage of unformatted reads is that the data files are much
smaller (typically by a factor of 10 or more) and reading the
data significantly faster. The disadvantage is that unformatted
files are binary, and thus cannot be modified or viewed with a
standard text editor. Also, Fortran unformatted files are NOT
transportable across different computer systems.
An unformatted read is accomplished by entering the command:
SET READ FORMAT UNFORMATTED
and then entering a standard READ command. For example,
READ LARGE.DAT X1 X2 X3
There are 2 ways to create the unformatted file in Fortran. For
example, suppose X and Y are to be written to an unformatted
file. The WRITE can be generated by:
a) WRITE(IUNIT) (X(I),Y(I),I=1,N)
b) WRITE(IUNIT) X,Y
The distinction is that (a) stores the data as X(1), Y(1),
X(2), Y(2), ..., X(N), Y(N) while (b) stores all of X then
all of Y. There is no inherent advantage in either method in
terms of performance or file size. The SET READ FORMAT
UNFORMATTED command assumes (a). To specify (b), enter the
command:
SET READ FORMAT COLUMNWISE (or UNFORMATTEDCOLUMNWISE)
Unformatted reading is supported only for variables or matrices
(i.e., not for parameters or strings). Also, it only applies
when reading from a file. The limits for the maximum number of
rows and columns for a matrix still apply (500 rows and 100
columns on most systems). When reading a matrix, the number of
columns must be specified via the SET UNFORMATTED COLUMNS
command. For example,
SET READ FORMAT UNFORMATTED
SET UNFORMATTED COLUMNS 25
READ MATRIX.DAT M
The maximum size of the file that DATAPLOT can read is equal to
the workspace size on your implementation (100,000 or 200,000
points on most installations). For larger files, it will read
up to this number of data values.
The data is assumed to be a rectangular grid of data written in
a single chunk. Only single precision real numbers are
supported. By default, the entire file (up to the maximum number
of points) is read. DATAPLOT does provide 2 commands to allow
some control of what portion of the file is read:
SET UNFORMATTED OFFSET <value>
SET UNFORMATTED RECORDS <value>
The OFFSET specifies the number of data values at the begining of
the file to skip. This is useful for skipping header lines
(similar to a SKIP command for reading ASCII files) and other
miscellaneous values. The RECORDS value is useful for reading
part of a larger file.
Be aware that Fortran unformatted files are NOT transportable
across systems. This is due to the fact that the file contains
various header bytes (the Fortran standard leaves implementation
of this up to vendor) that are not standard. Also, the storage
of real numbers can vary between platforms. This means that
the SET READ FORMAT UNFORMATTED command can NOT be used to read
raw binary files (as might be produced by a C program) and it
cannot, in general, be used to read unformatted Fortran files
created on systems other than the one you are running DATAPLOT on.
2) The following mathematical library functions were added:
LET A = HEAVE(X,C) - Heavside function (=1 if X>=C, 0
otherwise, C is 0 if no second argument)
LET A = CEIL(X) - ceiling function (integer value of x
rounded to positive infinity
LET A = FLOOR(X) - floor function (integer value rounded o
negative infinity)
LET A = STEP(X) - step function (synonym for FLOOR(X))
LET A = GCD(X1,X2) - greatest common divisor of X1 and X2
3) The following command was added:
LET A = MAD Y - medain absolute deviation
MEDIAN ABSOLUTE DEVIATION is a synonym for MAD. Given a variable
X with median value MED, the MAD is defined as the median of
the absolute value of (X-MED).
The BOOTSTRAP PLOT, JACKNIFE PLOT, STATISTIC PLOT, BLOCK PLOT, and
DEX PLOT commands were modified to support the MAD and AAD
statistics.
4) The PHD command was renamed DEX PHD. In addition, some I/O was
fixed in these routines.
5) Some bugs were fixed in the EDIT command. A few other
miscellaneous bugs were fixed.
7) The following functions were added to the probability library.
LET A = ALPCDF(X,ALPHA,BETA) - alpha cumulative distribution
function
LET A = ALPPDF(X,ALPHA,BETA) - alpha density function
LET A = ALPPPF(X,ALPHA,BETA) - alpha percent point function
LET A = CHCDF(X,NU) - chi cumulative distribution
function
LET A = CHPDF(X,NU) - chi density function
LET A = CHPPF(X,NU) - chi percent point function
LET A = COSCDF(X) - cosine cumulative distribution
function
LET A = COSPDF(X) - cosine density function
LET A = COSPPF(X) - cosine percent point function
LET A = DLGCDF(X,THETA) - logarithmic series cumulative
distribution function
LET A = DLGPDF(X,THETA) - logarithmic series density
function
LET A = DLGPPF(X,THETA) - logarithmic series percent point
function
LET A = GGDCDF(X,ALPHA,C) - generalized gamma cumulative
distribution function
LET A = GGDPDF(X,ALPHA,C) - generalized gamma density function
LET A = GGDPPF(X,ALPHA,C) - generalized gamma percent point
function
LET A = LLGCDF(X,DELTA) - log-logistic cumulative
distribution function
LET A = LLGPDF(X,DELTA) - log-logistic density function
LET A = LLGPPF(X,DELTA) - log-logistic percent point
function
LET A = PLNCDF(X,P,SD) - power lognormal cumulative
distribution function
LET A = PLNPDF(X,P,SD) - power lognormal density function
LET A = PLNPPF(X,P,SD) - power lognormal percent point
function
LET A = PNRCDF(X,P,SD) - power normal cumulative
distribution function
LET A = PNRPDF(X,P,SD) - power normal density function
LET A = PNRPPF(X,P,SD) - power normal percent point function
LET A = POWCDF(X,C) - power function cumulative
distribution function
LET A = POWPDF(X,C) - power function density function
LET A = POWPPF(X,C) - power function percent point
function
LET A = WARCDF(X,C,A) - Waring cumulative distribution
function
LET A = WARPDF(X,C,A) - Waring density function
LET A = WARPPF(P,C,A) - Waring percent point function
LET A = NCTPDF(X,NU,DELTA) - non-central t density function
(density and percent point
functions were added previously)
LET A = TNRPDF(X,A,B) - truncated normal density function
LET A = FNRPDF(X,U,SD) - folded normal density function
The Yule distribution is a special case of the Waring
distribution. Set A to 1 or simply omit the A parameter.
The generalized gamma distribution can handle negative values
for the C parameter (although not zero). Specifically, a value
of C = -1 is the inverted gamma distribution.
In addition, the log-normal cdf, pdf, and ppf functions were
upgraded to handle the standard deviation shape parameter (LGNCDF,
LGNPDF, LGNPPF). This parameter defaults to 1 if not specified.
In addition the following probability plots were added.
COSINE PROBABILITY PLOT Y
LET ALPAHA = <value>
LET BETA = <value>
ALPHA PROBABILITY PLOT Y
LET P = <value>
LET SD = <value> (this parameter optional, defaults to 1)
POWER NORMAL PROBABILITY PLOT Y
LET P = <value>
LET SD = <value> (this parameter optional, defaults to 1)
POWER LOGNORMAL PROBABILITY PLOT Y
LET SD = <value>
LOGNORMAL PROBABILITY PLOT Y
LET C = <value>
POWER FUNCTION PROBABILITY PLOT Y
LET NU = <value>
CHI PROBABILITY PLOT Y
LET THETA = <value>
LOGARITMIC SERIES PROBABILITY PLOT Y
LET DELTA = <value>
LOG LOGISTIC PROBABILITY PLOT Y
LET GAMMA = <value>
LET C = <value>
GENERALIZED GAMMA PROBABILITY PLOT Y
LET A = <value> (can omit for the Yule distribution)
LET C = <value>
GENERALIZED GAMMA PROBABILITY PLOT Y
In addition the following PPCC plots were added.
LET SD = <value> (this parameter optional, defaults to 1)
POWER NORMAL PPCC PLOT Y
LET SD = <value> (this parameter optional, defaults to 1)
POWER LOGNORMAL PPCC PLOT Y
LET SD = <value>
LOGNORMAL PPCC PLOT Y
CHI PPCC PLOT Y
VON MISES PPC PLOT Y
POWER FUNCTION PPCC PLOT Y
LOG LOGISTIC PPCC PLOT Y
In addition the following random number generator was added.
LET C = <value>
LET Y = POWER FUNCTION RANDOM NUMBERS FOR I = 1 1 N
-----------------------------------------------------------------
The following enhancements were made to DATAPLOT NOVEMBER, 1994.
-----------------------------------------------------------------
1) The following mathematical library functions were added:
LET A = FRESNS(X) - Fresnel sine integral
LET A = FRESNC(X) - Fresnel cosine integral
LET A = FRESNF(X) - Fresnel auxillary function f integral
LET A = FRESNG(X) - Fresnel auxillary function g integral
LET A = SN(X,M) - Jacobian elliptic sn function
LET A = CN(X,M) - Jacobian elliptic cn function
LET A = DN(X,M) - Jacobian elliptic dn function
LET A = PEQ(XR,XI) - the real component of the Weirstrass
elliptic function (equianharmomic case)
LET A = PEQI(XR,XI) - the complex component of the Weirstrass
elliptic function (equianharmomic case)
LET A = PEQ1(XR,XI) - the real component of the first
derivative of the Weirstrass elliptic
function (equianharmomic case)
LET A = PEQ1I(XR,XI) - the complex component of the first
derivative of the Weirstrass elliptic
function (equianharmomic case)
LET A = PLEM(XR,XI) - the real component of the Weirstrass
elliptic function (cwlemniscatic case)
LET A = PLEMI(XR,XI) - the complex component of the Weirstrass
elliptic function (lemniscatic case)
LET A = PLEM1(XR,XI) - the real component of the first
derivative of the Weirstrass elliptic
function (lemniscatic case)
LET A = PLEM1I(XR,XI) - the complex component of the first
derivative of the Weirstrass elliptic
function (lemniscatic case)
------------------------------------------------------------
Changes prior to this are no longer in the news file
because they are documented in the Reference Manual and
the on-line help.
------------------------------------------------------------
YOU HAVE JUST ACCESSED THE FILE DPNEWF.
|