![](https://webarchive.library.unt.edu/eot2008/20090109083205im_/http://www.hss.energy.gov/images/spacer.gif) |
|
![](https://webarchive.library.unt.edu/eot2008/20090109083205im_/http://www.hss.energy.gov/images/spacer.gif) |
![](https://webarchive.library.unt.edu/eot2008/20090109083205im_/http://www.hss.energy.gov/images/spacer.gif) |
|
Instructions for Using R-project Software for Analyzing Industrial Hygiene Data
(Including Data with Non-Detects and Non-parametric Data)
- Introduction: These are instructions for generating
industrial hygiene metrics using R routines and MS Excel for Windows. MS Excel can
be used to clean-up data, group the data, create text files for analysis and create
tables and charts for final reports. Any word processing,
spreadsheet, or database software could be used for creating the text file. The
Statistical Analysis of Non-Detects (sand) package contains R routines that read
text files and generate an output file containing metrics. The metrics
generated are those recommended by the AIHA
and are interpreted and used as described in the book.
- You should have already gone to http://www.csm.ornl.gov/esh/statoed/
and followed the instruction for installing R and the “sand” package. Now you can use R
to analyze data in a text file. The text file must have at least two
columns. One column contains a value and the second a 0 or 1 for non-detect or
detect. Each column must have a one-word heading. The following describes one way to
create the text file using MS Excel.
1Ignacio, J.S., and W.H. Bullock: A Strategy for Assessing and Managing Occupational Exposures, 3rd ed. Fairfax, Va.: AIHA Press, 2006.
- This example uses data from Table IV.3 of the AIHA book and is also the
example data in the file aihand.txt included in the download from the “statoed” web
site. The first column “Monitoring Data (mg/m3)” contains
a mix of values and the text symbol “<”. The “Value” and “Detected 0=No 1=Yes”
columns were created using a variety of Excel text editing and logic functions.
For example the Excel logic function =IF(LEFT(A2,1)="<",VALUE(REPLACE(A2,1,1,0)),A2)
removes the “<” symbol when it appears. While not important for a small file
like this, these editing functions are very helpful when cleaning-up large data
sets.
![AIHA book](images/IMAGE004.jpg)
- Columns F and G contain the values and flags that you need to convert to a text
file. Copy the two columns; open a new file, select “paste special”, and “values”.
![Book 7](images/IMAGE006.jpg)
- In this case Column B has a 3 word heading “Detected 0=No 1=Yes” and this has to
be changed to a one word heading such as “Detected” or “Flag”. The next step is to close the file and save
in the rmain folder as a tab delimited text file. Click through the screens
warning of the loss of formatting etc.
![Book 3](images/IMAGE008.jpg)
- Open the R console by double clicking on the icon in the “rmain” folder you
created. Type in the command “aihand<-readss("aihand",L=5)”. The L=5
is the value of the OEL being used to interpret data. Hit “Enter” and the file is
read and analyzed. Once you have typed in commands, the up and down arrows toggle
through the commands you have used. If you “Save Workspace Image” at the end of
the R session, the commands will be saved. With a large dataset, one often will
group data into subsets based on some variable (location, time, individual, etc)
and create several text files for analysis. Use the up arrow, edit the file name
command line (i.e. Bldg1<-readss("Bldg1",L=5), Bldg2<-readss("Bldg2",L=5),
etc.) When the analyses are complete the prompt returns.
![RGui](images/IMAGE010.jpg)
- R creates a new comma delimited text file “aihandout.csv” that contains the
metrics. This can be opened with MS Excel and the two columns can be copied and
pasted into spread sheet you will be using for further analysis and report writing.
![rmain](images/IMAGE012.jpg)
- The “readss” command generates the following metrics. The industrial
hygienist chooses those that help interpret the data. Mean and
confidence intervals are useful for decisions on exposure groups and
constructing job and exposure matrices. Upper tolerance limits and percent
exceedance are useful for determining compliance and other day-to-day risk
management decisions. Parametric and non-parametric versions of
each are included.
Label |
Metric |
Glossary |
mu |
0.925 |
Maximum likelihood estimate (MLE) of mean of the log transformed data (log of GM) |
se.mu |
0.099 |
Estimate of the standard error of mu |
sigma |
0.37 |
MLE of standard deviation of log transformed data
(log of GSD) |
se.sigma |
0.079 |
Estimate of standard error of sigma |
GM |
2.522 |
MLE of geometric mean |
GSD |
1.447 |
MLE of geometric standard deviation |
EX |
2.7 |
MLE of the EX the (arithmetic) mean |
LCLa_95 |
2.26 |
95% Lower Confidence Limit (LCL) for EX |
UCLa_95 |
3.226 |
95% Lower Confidence Limit (LCL) for EX |
KMmean |
2.773 |
Kaplan-Meier (KM) Estimate of EX |
KM.LCL |
2.29 |
95% LCL for KM EX |
KM.LCL |
3.257 |
95% UCL for KM EX |
KM.se |
0.269 |
Standard Error of KMmean |
Xp.obs |
4.75 |
Observed 95th Percentile of data |
Xp |
4.633 |
MLE of 95th Percentile |
Xp.LCL |
3.521 |
MLE of 95% LCL for Xp |
Xp.UCL |
6.096 |
MLE of the 95% Upper Tolerance Limit (UTL) of Xp |
NpUTL |
NA |
Nonparmetric estimate of the 95% UTL of Xp. |
Maximum |
5.5 |
Largest value in data set |
nonDet% |
20 |
The percent of Xs that are left censored |
n |
15 |
The number of observations in the data set |
Rsq |
0.969 |
Square of correlation for the data and standard log normal |
m |
12 |
The number of detected Xs |
f |
3.208 |
MLE of the percent exceeding the specified limit L |
f.LCL |
0.396 |
MLE of 95% LCL for f |
f.UCL |
14.767 |
MLE of 95% LCL for f |
fnp |
6.667 |
Nonparametric estimate of f for limit L |
fnp.LCL |
0.341 |
Nonparametric estimate of 95% LCL for f |
FnUCL_95 |
27.94 |
Nonparametric estimate of 95% LCL for f |
m2logL |
41.3044 |
-2 times the log-likelihood function |
L |
5 |
L is specified limit for the percent exceeding; e.g., the OEL |
P |
0.95 |
percentile for UTL p-gamma |
gam |
0.95 |
one-sided confidence level gamma. Default is 0.95 |
- R will generate a log probability plot (also called a
Q-Q Plot) that provides a visual check of whether the data fits the lognormal
model (see the AIHA book.) Creating a log probability plot requires two commands.
The first command, pnd<-plend(aihand), creates a data frame. The second,
qq.lnorm(pnd), generates a probability plot as a visual check of log-normality.
This plot displays only detected values and displays replicates as a single data
point, which aids the visual check when the data set is large. Clicking the
camera button copies the image so that it can be pasted into Excel or another
document.>
![rmain](images/IMAGE014.jpg)
- Once the metrics and Q-Q plot have been copied into your spreadsheet, you can
continue using them to generate charts and tables needed to support your data
analysis and reporting.
![rmain](images/IMAGE016.jpg)
- The metrics calculated by the “readss” command can also be calculated separately.
The commands for these are described in the help menu, which is shown when you
type “help(sand).” The “readss” routine requires at least 3 detected results to
run. One function that can be used with all non-detect is “nptl(n , p = 0.95,
gam = 0.95)”, which provides the order of the value in a data set with n values
that corresponds to non-parametric upper tolerance limit for specified
percentile and upper tolerance.
![rmain](images/IMAGE018.jpg)
This page was last updated on February 14, 2008
|
|
|
![](https://webarchive.library.unt.edu/eot2008/20090109083205im_/http://www.hss.energy.gov/images/spacer.gif) |