Program PRESENCE ver 2.3

Synopsis

This software was developed to enable estimation of the proportion of area of occupied (PAO), or similarly the probability a site is occupied, by a species of interest according to the model presented by MacKenzie et al. (2002) 1.

Typically, species are not guaranteed to be detected even when present at a site, hence the naïve estimate of PAO given by:

                # sites where species detected
         PAO = --------------------------------
                    total # sites surveyed

will underestimate the true PAO. MacKenzie et al. (2002) propose that by repeated surveying of the sites, the probability of detecting the species can by estimated which then enables unbiased estimation of PAO. This model has been extended by MacKenzie et al. (2003)2 that also enables estimation of colonization and local extinction probabilities. These models are briefly discussed here.

Contents

Installing Program PRESENCE

Starting with Version 2.0 of program PRESENCE, the program is divided into two pieces: An interactive piece where input data are entered, and models are specified and run, and a computational piece: the program which computes the estimates of the specified model. In most cases, users will never need to deal with the computational piece directly as it is called from the interactive piece.

Installation:

Assuming you have succumb to becoming one of Gate's droids, here is the procedure to install on a Windoze system: Suggestion: If you are unable to install the program due to access restrictions, install PRESENCE in a different folder (not "C:\Program Files"). You can create a folder named "C:\Pgms" to hold the program which will not be write-protected like C:\Program Files.

Program PRESENCE has been run sucessfully on MAC's (using BootCamp)and Linux PC's (using Wine).

Note: After running PRESENCE, it's a good idea to check to make sure you have the latest version. This can be done automatically in the 'Tools' menu. This might prevent the case of reporting a bug which has already been fixed.

Running the program

Overview: The program can get input data (presence/absence data and site/sample covariate data) from a few different sources.

The 2nd option above is the recommended choice since most spreadsheet programs will automatically backup the data as it is entered. In case the power fails, you might be able to retrieve previously entered data. If you choose the first option, you should periodically save your work.

Once the data are entered into the program, it must be saved (use menu option File/Save) in order to build and run models. The saved file will have an extension (last 3 characters of filename) of "pao". This file will contain both presence/absence data and site and sample covariate data.

To build and run models, a 'project' file is created by the program. This file will contain the results of each model and will have an extension of ".pa2".

To start a new analysis, Select 'File/New project' from the menus. A form will appear which will hold the information about the analysis, including title, filenames, data-type, and numbers of sites/occasions/covariates. At this point, you may use a previously-created input file (something.pao) by clicking the 'select file' button, or go to the input screen by clicking the 'Input form' button. Clicking the 'select file' button allows you to navigate to the folder containing the input file and select the file. Clicking the 'Input Form' button displays a new form with a tabbed spreadsheet-like interface.

Input Form

To enter data into this form, click on the first element (site 1, sample #1), and enter '1' (without quotes) if the species was detected at site 1, sample #1, '0' if the species was not detected, or '-' if this site was not sampled. The 'Tab' key will move the cursor to the next sample (or use the mouse) where you can enter the data for site 1, sample 2.

If your data is already entered in a spreadsheet program, you can open that program, select all the site/sample data (no headers or other fields), and click 'Edit/Copy' from the menus. Then, go back to PRESENCE, and click the 'Edit/Paste values' from the menu. If your spreadsheet contains sitenames in the first column, you can include these in the selection-edit-copy, then select 'Edit/Paste w/sitenames' in the PRESENCE menu.

If you have covariate data. (e.g., weather or effort data), you can enter these by clicking the appropriate tab and entering data as was done with the presence/absence data.

Also included on the input form is an option of simulating data (single-season model only at this time). This is useful if you are just trying the program and have no data, or if you would like to design an experiment and would like to know how many sites/samples are needed to get a desired level of precision (although, program GENPRES will do a much better job). Both presence/absence data and covariate data may be simulated.

To simulate presence/absence data, click the 'Presence/Absence data' tab, then click 'Generate data' under the 'Simulate' menu. You will be prompted for a value of psi, then values of p (one for each survey or occasion). Optionally, you can enter the value of psi as a linear function of the first covariate (e.g., 0.75 - 0.1*X1), however, you must first simulate the covariate data. Values of p should be separated by commas with no spaces in between.

To simulate covariate data, click the 'Site Covars' tab, then click 'Generate data' under the 'Simulate' menu. You will then be prompted for a mean covariate value, followed by a standard error. Binomial data can be generated by entering '-99' for the mean, followed by the binomial probability (p) for the standard error.

Once the data are entered (or simulated), click 'File/Save' from the menu, then click 'File/Close'.

Next, click the 'select file' button on the 'enter specifications' form, and use the Windows file selector to navigate to the folder where the saved file resides and select the file.

After filling out the form, click the 'OK' button to create the PRESENCE project file. This will close the specification window and open a 'Results summary' window. You are now ready to compute estimates under pre-defined models, or build your own 'Custom' model.

Model Overview

Currently, there are nine types of models can be fit to detection/nondetection data within Program PRESENCE. The first is the single season model detailed by MacKenzie et al. (2002)1, which assumes the sites are closed to changes in the state of occupancy for the duration of sampling, and the second is the multiple season extension detailed by MacKenzie et al. (submitted)2. Both models require a similar basic sampling situation detailed below. The third type of model computes detection probabilities when there two species present. The fourth model type is one described by J.A. Royle which uses species counts instead of presence/absence. The multi-method model uses data similar to the multi-season model data, except that each survey within a season represents a different method of detection, instead of just another survey. Multi-state models use additional codes in the detection history data to denote the state of the species in a particular site/survey (0=not detected, 1=detected in state 1, 2=detected in state 2).

The Basic Sampling Scheme

N sites are being surveyed over time where the intent is to establish the presence or absence of a species. The sites may constitute a naturally occurring sampling unit such as a discrete pond or patch of vegetation; a monitoring station; or a quadrat chosen from a predefined area of interest. The occupancy state of sites may change over time, however during the study there are periods when it is reasonable to assume that, for all sites, no changes are occurring, (e.g, within a single breeding season for migratory birds). The study therefore comprises of T primary sampling periods (seasons), between which changes in the occupancy state of sites may occur. Within each season, investigators use an appropriate technique to detect the species at kt surveys of each site.

The species may or may not be detected during a survey and is not falsely detected when absent. The resulting detection history for each site may be expressed as T vectors of 1’s and 0’s, indicating detection and nondetection of the species respectively. We shall denote the detection history for site i at primary sampling period t as Xi,t, and the complete detection history for site i, over all primary periods, as Xi. The single season model results when T=1, and the multiple season model for T>1.


Single Season Model

MacKenzie et al. (2002)1 present a model for estimating the site occupancy probability (or PAO) for a target species, in situations where the species is not guaranteed to be detected even when present at a site. Let ψ be the probability a site is occupied and p[j] be the probability of detecting the species in the jth survey, given it is present at the site. They use a probabilistic argument to describe the observed detection history for a site over a series of surveys. For example the probability of observing the history 1001 (denoting the species was detected in the first and fourth surveys of the site) is:

ψ × p[1](1-p[2])(1-p[3])p[4].

The probability of never detecting the species at a site (0000) would therefore be,

ψ × (1-p[1])(1-p[2])(1-p[3])(1-p[4]) + (1-ψ),

which represents the fact that either the species was there, but was never detected, or the species was genuinely absent from the site (1-ψ). By combining these probabilistic statements for all N sites, maximum likelihood estimates of the model parameters can be obtained.

The model framework of MacKenzie et al. (2002) is flexible enough to allow for missing observations: occasions when sites were not surveyed. Missing observations may result by design (it is not logistically possible to always sample all sites), or by accident (a technicians vehicle may breakdown enroute). In effect, a missing observation supplies no information about the detection or nondetection of the species, which is exactly how the model treats such values.

The model also enables parameters to be function of covariates. For example, occupancy probability may be a function of habitat, while detection probability is a function of environmental conditions such as air temperature. The model therefore allows relationships between occupancy state and site characteristics to be investigated. Covariates are entered into the model by way of the logistic model (or logit link).

A key assumption of the single season model is that all parameters are constant across sites. Failure of this creates heterogeneity. Unmodeled heterogeneity in detection probabilities will cause occupancy to be underestimated. If there is unmodelled heterogeneity in occupancy probabilities, then it is believed that the estimates will represent an average level of occupancy, provided detection probabilities are not directly related to the probability of occupancy.

Another major assumption of the MacKenzie et al. (2002) model is that the occupancy state of the sites does not change for the duration of the surveying. Situations where this may be violated, for instance, would be for species with large home ranges, where the species may temporarily be absent from the site during the surveying. If this process of temporary absence from the site may be viewed as a random process, (e.g., the species tosses a coin to decide whether it will be present at the site today), then this assumption may be relaxed. However, this will alter the interpretation of the model parameters (“occupancy” should be interpreted as “use” and “detection” as “in the site and detected”). More systematic mechanisms for temporary absences may be more problematic and create unknown biases. Although, users are reminded that the model assumes closure of the sites at the species level, not at the individual level, so there may be some movement of individuals to/from sites without overly affecting the model.

Pre-defined model

There are 6 predefined models that users can run. The “groups” refer to the number of (unknown) groups in the population of occupied sites with different detection probabilities. For instance, “Two Groups” suggest there are 2 types of sites where species have different detection probabilities (perhaps related to species abundance, low or high, say), however group membership is unknown. Finite mixture-models are used to model these unknown groups by introducing parameters representing the probability of being in each group (see Pledger 20003, and references therein for details), and this is one approach to allow for heterogeneous detection probabilities. The “Single Group” model is the one discussed by MacKenzie et al. (2002). In each case, detection probabilities (p) may be specified as constant across surveys, or survey specific.

For new users of the program, I'll start with the single-season data-type and describe the process of building/running models. Later, I'll describe how to run the other data-types, although you might be able to skip that part. For the single-season data-type, I can't think of a situation where you would not want to run some of the pre-defined models. To run one of these models, click 'Run/Analysis' from the menus. A form will appear which allows you to pick one of the pre-defined models, or build a custom model. Also on the form are self-explanatory options for the analysis. The pre-defined models include the following models:

modeldescription
1 group, constant p species at all sites/samples are detected with a single probability, p.
1 group, survey-specific p detection probability at all sites, sample #1 = p(1), sample #2 = p(2), sample #3 = p(3)...
2 groups, constant p There are two subgroups of species which have different detection probabilities, p1, p2, and the proportion of species which have detection probability of p1 = α
2 groups, survey-specific p There are two subgroups of species which have different detection probabilities at each sample, p1(1), p2(1),p1(2), p2(2),... and the proportion of species which have detection probability of p1(i) = α (the remaining proportion have detection probability, p2.
3 groups, constant p There are three subgroups of species which have different detection probabilities, p1, p2 and p3, the proportion of species which have detection probability of p1 = α1 and the proportion of species which have detection probability of p2 = α2 (the remaining proportion have detection probability, p3.
3 groups, survey-specific p There are two subgroups of species which have different detection probabilities at each sample, p1(1), p2(1),p1(2), p2(2),... and the proportion of species which have detection probability of p1(i) = α


Custom model

Program PRESENCE allows you to define models which are not included in the pre-defined model set. To allow flexibility, models are defined by using a 'design-matrix'. This design-matrix can be thought of as a translation-table which transforms real-estimated parameters to/from the model parameters (ψ, and p(i)).

In the design-matrix, the real-estimated parameters are represented by the columns, and the model parameters are represented by the rows. As an example, let's look at building a custom model which is equivalent to the first pre-defined model. When you click 'Run/Analysis', then click 'custom model', the pre-defined model names disappear, and a design-matrix form appears. This form contains a tab for the occupancy model parameter (psi), and a tab for the detection model parameters (p1,p2,p3,p4). If you click the 'occupancy' tab, there will be a spreadsheet with 1 row (psi) and 1 column (a1). The real-estimated parameter is 'a1' and the model parameter is psi, which is computed as psi = 1 * a1. If you click the 'Detection' tab, a spreadsheet will appear which contains 4 rows and 1 column. In the current state, the model parameter p1, will be computed as: p1 = 1 * b1. The model parameters p2, p3, and p4 will also be computed as: p2 = 1 * b1, p3 = 1 * b1, and p4 = 1 * b1. So, the program will estimate 1 parameter, a1 which will yield a value for psi, and 1 parameter, b1, which will yield values for p1, p2, p3, and p4. This is equivalent to the first pre-defined model. To build a model which is equivalent to the second pre-defined model (1 group, survey-specific P), we need different real-estimated parameters for each detection probability, p1, p2, p3, and p4. To do this, we need to add 3 more columns to the spreadsheet. Click 'Edit/Add cols' from the menus and enter '3' when asked how many columns to add. The spreadsheet should now contain 4 rows and 4 columns. Now, we need to change the numbers in the spreadsheet so that:

p1 = 1 * b1
p2 = 1 * b2
p3 = 1 * b3
p4 = 1 * b4.

Click 'Init/Full Identity' from the menus and the spreadsheet will be filled with 1's on the diagonal, and zeros elsewhere. So, reading the rows of the spreadsheet,

p1 = 1 * b1 + 0 * b2 + 0 * b3 + 0 * b4.
p2 = 0 * b1 + 1 * b2 + 0 * b3 + 0 * b4.
p3 = 0 * b1 + 0 * b2 + 1 * b3 + 0 * b4.
p4 = 0 * b1 + 0 * b2 + 0 * b3 + 1 * b4.

We now have 5 real-estimated parameters (a1,b1,b2,b3,b4), which will be used to compute 5 model parameters (psi,p1,p2,p3,p4). This is equivalent to the second pre-defined model (1 group, survey-specific P).

A simple Custom Model

As an example of a simple custom model, suppose detection probabilities, p(i), are not constant as in the first pre-defined model, and not different each sample, as in the second pre-defined model. Suppose that the first two detection probabilities are the same, but different from the last two detection probabilities. In this case, we would need to estimate 1 parameter for the first two p's, and another parameter for the second two p's (and a third parameter for psi). Here is how we would want the p's computed:

p1 = 1 * b1 + 0 * b2.
p2 = 1 * b1 + 0 * b2.
p3 = 0 * b1 + 1 * b2.
p4 = 0 * b1 + 1 * b2.

So, the detection spreadsheet would contain 4 rows, and 2 columns (b1,b2) and would look like this:

.b1b2
p110
p210
p301
p401

A second Custom Model

In this example, suppose detection probabilities, p(i), are hypothesized to be increasing by a constant amount over the surveys. So, the second detection probability would be equal to the first detection probability plus a constant (X), and the third would be equal to the first + 2*X, and the last detection probability would be equal to the first + 3*X. We would need 2 real-estimated parameters (first detection probability, and X) to compute the 4 model detection parameters. Here is how to write the formulae for the p's:

p1 = 1 * b1 + 0 * b2.
p2 = 1 * b1 + 1 * b2.
p3 = 1 * b1 + 2 * b2.
p4 = 1 * b1 + 3 * b2.

So, the detection spreadsheet would contain 4 rows, and 2 columns (b1,b2) and would look like this:

.b1b2
p110
p211
p312
p413

A third Custom Model (with covariates)

In the previous examples, detection probabilities were assumed to be the same for each site. By using covariates, we can compute a different detection probability for each site (or possibly each site/sample combination). Without covariates, this assumption would cause us to estimate a large number of parameters (20 for the first pre-defined model, and 80 for the second). By using covariates, we can compute p as:

p(site i, survey j) = 1*b1 + X(i,j)*b2

where X(i,j) is the value of the sample-covariate at site i, sample j. Here, p(i,j) is equal to a base detection probability, (intercept), (1*b1) + an effect (b2) of covariate X(i,j). If b2=0 then there is no effect of the covariate (p=constant). If b2>0 then there is a positive effect of the covariate (higher covariates yield higher p's), and if b2<0 then there is a negative effect of the covariate (higher covariates yield lower p's).

If you had two covariates for each site/sample, you could compute p as:

p(site i, survey j) = 1*b1 + X(i,j)*b2 + Y(i,j)*b3

where Y(i,j) is the value of the 2nd covariate at site i, sample j. The detection design matrix for the case with 2 covariates would look like this:

.b1b2b3
p11X(i,j)Y(i,j)
p21X(i,j)Y(i,j)
p31X(i,j)Y(i,j)
p41X(i,j)Y(i,j)

To run this type of model, click 'Run/Analysis', add columns for the covariates, then, instead of entering '1' in the cells, click on the first cell in a column, and select the covariate name from the 'Init' menu.

A First Example

Now seems like a good time to run through an example. Start program PRESENCE and select 'File/New Project' from the menus. When the 'Enter Specifications' form appears, click the 'Input Data Form' button.

The input data form will contain only 1 tab, for the presence/absence data. We're going to simulate some data in this form. Let's assume we're dealing with a species which has an occupancy rate (ψ) of 0.60 (60% of areas contain at least 1 individual of the species). Also, assume/pretend detection probability is lousy in the beginning, p(1)=0.2, and gets better on each successive sample, p(2)=.4, p(3)=.6, p(4)=.8. This is enough information to generate data for the single-season data-type. To generate presence/absence data, select 'Generate data' from the 'Simulate' menu and enter ψ (0.6) when prompted. Next, enter '0.2,0.4,0.6,0.8' when prompted for the detection probabilities (p) and the table will clear and be filled with randomly generated presence/absence data with those parameters. Click 'File/Save as' and save the file with the name, 'simdata1.pao'. Then, click 'File/Close'.

Next, click the 'Click to select file' button and select the simulated data file we just created (simdata1.pao). The program will fill in the boxes for the filename and results filename. Enter 'simulated data w/ psi=.6, p=.2,.4,.6,.8' in the title box, and click the 'OK' button.

You should see an empty results browser at this point. To run the first pre-defined model, click 'Run/Analysis:single season' from the menus. For the pre-defined models, the program automatically fills in the model name. This name can be anything, but it's best to make it something easily recognized. (I'll describe a common convention for naming later.) In the 'Model' box, you'll see that 'pre-defined' is already selected, and 6 pre-defined models are listed (with the first one selected). Let's start with this model, but first check the 'list data' option. Click 'list-data', then click the 'OK to run' button.

Once you click the 'OK to run' button, you might see another window flash by (perhaps not if you have a fast computer), then a dialog box appears with a short summary of the results of that model. Click 'Yes' to include the output of that model in the results browser. (You might click 'No' in the case where you accidentally run a model which was previously run.) After clicking 'Yes', the summary information from that model is displayed in the results browser.

To view the estimates of psi, and p from this model, use the mouse to position the cursor over the name of the model, '1 group, Constant P', then click with the right mouse button. A pop-up menu will appear. Position the mouse over 'View model output' and click with the left mouse button. This will cause a Notepad window to appear with the results. Look at the output and note the estimates of ψ(Psi), and p. (I got .7267, and .4300, but yours will be different.)

Next, let's run another pre-defined model - one with survey-specific p's. Close the notepad window with the results, then click 'Run/Analysis:single season' from the menus. Click '1 group, survey-specific p' in the 'Model' box (note model name changed for you), then click 'OK to run'. Click 'Yes' to include the results of this model in the results browser, then position the mouse over the model name, right-click, then left-click 'view model output'. In the notepad window, note ψ (Psi), and p(2),p(3),p(4) and p(5) (p(1) is inestimable).


Spatial Dependence in Custom Model

Normally, surveys are supposed to be independent from each other, but frequently, surveys are conducted along trails such that when a species is found at one site, nearby sites have a much higher probability of the species being present than those farther away. This can be accounted for by adding two new parameters, &theta0, &theta1. &theta0 is the probability that the species is present locally, given the species was not present in the previous site, and &theta1 is the probability that a species is present locally, given it was present at the previous site. An example detction history might be:
01011
Here, the species was detected at the 2nd and 4th and last surveys (segments of transect line), but not detected at the 1st and 3rd surveys. The probability of this history would be represented by:

&psi[(1-&theta0)&theta0+&theta0(1-p1)&theta1] p2[(1-&theta1)&theta0+&theta1(1-p3)&theta1] p4&theta1p5

In this model, we assumed that the species was not locally present before the first sample (first &theta is &theta0), when in reality, the first segment might be one where the species is locally present. In this case, we would want the first &theta0 to be a value between &theta0 and &theta1. The value should be the expected value you would get if you randomly picked a survey from all surveys. So, when the spatial dependence model is chosen, there is an option to use the same &theta0 for all surveys, or use an 'average' of &theta0 and &theta1 for the first survey of each site.

When this model is chosen, the &theta parameters will appear in the design matrix window, in the same tab as the occupancy (&psi) parameter.

Multiple Season Model

The multiple season model (MacKenzie et al., submitted) 2 extends the single season model by introducing two additional parameters, ε[t] and γ[t]. These parameters are, respectively, the probability a species becomes locally extinct or colonizes a site between seasons t and t+1.

For example, if the detection history 101 000 was observed at a site (denoting the species was detected in the first and third survey of the site in the first season; not detected otherwise), the probability of this occurring could be expressed as;

ψ×p[1,1](1-p[1,2])p[1,3] × {(1-ε[1]) (1-p[2,1]) (1-p[2,2]) (1-p[2,3]) + ε[1]}.

This represents the fact that after the first season, the species may have not gone locally extinct (1-ε[1]), but was undetected by the surveying, or the species did go locally extinct (ε[1]) between the first and second seasons.

The model may also be reparameterized in terms of ψ[t] and ε[t]; or ψ[t] and γ[t], as in some situations this may be a more meaningful parameterization (in terms of overall occupancy) than in terms of the underlying processes. As in the single season model, parameters may be functions of covariates using the logit link.

Note this model does not allow for a so-called “rescue effect”, where the local extinction of a colony is “rescued” by the re-colonization of the site before the unoccupied site can be observed, i.e., the site becomes unoccupied then re-occupied all between a single season. Such an effect is sometimes included in metapopulation models, however while a rescue effect is biologically plausible, it can not be estimated (without some potentially unrealistic strict assumptions) from the type of data we are considering here, nor from the type of data often collected in metapopulation studies. The main argument for not including a rescue effect is: why should the rescue of the colony be limited to an arbitrary single event, when possibly there may be a number of opportunities between two seasons for the rescue to occur? To reduce the possibility of having unobserved changes in the occupancy state of sites, the sampling scheme should be designed to reflect the appropriate time scale of the system under study.

Alternate Parameterizations

The initial parameterizaton uses a single initial occupancy paramter, k-1 extinction parameters (assuming k seasons), k-1 colonization parameters, and T detection parameters (assuming T surveys). Once these parameters are estimated, other quantities of interest can be computed. Occupancy in other seasons can be computed as:
Psi(2) = Psi(initial)*(1-eps(1)) + (1-Psi(initial))*gam(1)
Psi(3) = Psi(2)*(1-eps(2)) + (1-Psi(2))*gam(2)
Psi(4) = Psi(3)*(1-eps(3)) + (1-Psi(3))*gam(3)
 :         :        :            :        :
 
Sometimes, it is desirable to model seasonal occupancy as a function of some covariates. Since seasonal occupancy is computed from eps(i) and gam(i), this cannot be done with these parameters.

An alternate parameterization in PRESENCE uses k occupancy parameters, k-1 extinction parameters, and T detection parameters. The k-1 colonization parameters are then computed from the seasonal psi's and eps's by solving the above equations for gam(i).

           Psi(2) - Psi(initial)*(1-eps(1))
 gam(1) = ---------------------------------
                 (1-Psi(initial))
 
By selecting this parameterization, it's now possible to build a model where seasonal occupancy (Psi(i)) is a function of a seasonal covariate.

Similarly, we could have estimated the colonization parameters and computed the extinction parameters. This parameterization is sometimes useful if the above parameterization fails to converge on reasonable estimates.

Finally, PRESENCE can model extinction and colonization in such a way that the proportion that go locally extinct is the same as the proportion that don't colonize (eps=1-gam).


Two-Species Model

The two-species model (MacKenzie et al., ????) 5 extends the single season model in another way by allowing the computation of occupancy parameters of two species along with conditional probabilities of occupancy when the other species is present or detected. Input data for this model is in the same form as the single-species, single-season model except that the first half of the detection history records are assumed to be species A, and the second half of the records are assumed to be species B. So, if there are 60 sites, the input would consist of 120 detection history records. Records 1-60 would be the site-detection history records for sites 1-60, species A, and records 61-120 would be the site-detection history records for sites 1-60, species B.

Alternate parameterization

Since two of the parameters in the default parameterization are not probabilities bounded by the interval (0 - 1), numerical problems can arise. (eg., if rA is zero, &lambda would be undefined.)

An alternate parameterization was developed which only uses conditional probabilities as parameters, which is more numerically stable. The parameters are:

Using this parameterization, quantities from the other parametrization can be computed. (eg.,

&psiB = &psiA*&psiB1+(1-&psiA)*&psiB2
φ = &psiA*&psiB1/(&psiA*&psiB)


Single-season-Multi-method Model

The multi-method model (MacKenzie et al., ????) 5 extends the single season model by allowing detection probabilities to be different for different methods of observation. This allows the computation of an additional parameter, θ which is the probability that individuals are available for detection at the site, given that they are present.

Single-season-Multi-state Model

In the multi-state model (MacKenzie et al., ????) 5, two kinds of detections are recorded. Detections where only adults observed are recorded as '1' in the data, and detections of known breeding adults (adults seen with young) are recorded as '2' in the data. This allows the computation of an additional parameter, γ which is the probability that adults breed, given that they are present. Input data for this model is in the same form as the single-species, single-season model except that breeding status ('1'=adults only, or '2'=adults and young) is recorded instead of presence ('1').

Royle point-counts Model

The Royle point-count model (Royle, 2004) 6 (Royle and Nichols, 2003) 7 estimates population size from point-count data.

Single Season Output

The results for fitting the single season model to the data are stored in the results database. To view the output of a specific model, position the cursor over the desired model name, click with the right mouse button, then select 'view output' with the left mouse button. If the 'list data' option had been selected, the input data will appear at the beginning of the output. Next, the number of sites, sampling occasions and missing observations in the dataset; followed by the number of parameters in the model, twice the negative log-likelihood and Akaike’s Information Criterion (AIC), (e.g.,

Number of sites = 63
Number of sampling occasions = 14
Number of missing observations = 435

Number of parameters = 2
-2log(likelihood) = 606.4220
AIC = 610.4220
For a predefined model, the output then includes a naïve estimate of occupancy (the proportion of sites where the species was detected at least once), the estimated PAO with standard error in parentheses, the group membership probabilities, and detection probabilities. Finally the variance-covariance matrix is outputted.
Naive estimate = 0.9365
Proportion of sites occupied = 0.9887 (0.0186)
Probability of group membership = 1.0000
Detection probabilities:
Group 1 = 0.4215

Variance-Covariance Matrix
psi	p(G1)	
0.0003	-0.0001	
-0.0001	0.0006	
The bootstrap estimate of the variance-covariance matrix will follow if that option has been selected.

For custom models, after the model AIC has been output, a note is printed reminding that the logistic model has been used to estimate parameters, followed by the naïve occupancy estimate, and an overall estimate of occupancy (and standard error) derived for the analyzed sites. Next are the estimated coefficients for the logistic model and their variance-covariance matrix. Again, if the bootstrap option for estimating the variance-covariance matrix was selected, this would be at the end of the output.


Model has been fit using the logistic link.

Naive estimate = 0.9365

Overall proportion of sites occupied = 0.9887 (0.0186)

Coefficients for site covariates:
		Intercept	4.4750
Coefficients for sampling covariates:
		Intercept	-0.3167

Variance-Covariance Matrix
Intercept	Intercept	
2.7855		-0.0254	
-0.0254		0.0099

Multiple Season Output

The results for fitting the multiple season model to the data is outputted to the “Output” window. As for single season models, the output begins with a listing of the input data (if desired), followed by stating the number of sites, sampling occasions and missing observations in the dataset; followed by the number of parameters in the model, twice the negative log-likelihood and Akaike’s Information Criterion (AIC), (e.g.,
Open Population Model:

Number of sites = 63
Total number of sampling occasions = 14
Number of primary sampling periods = 5
Number of missing observations = 435

Number of parameters = 7
-2log(likelihood) = 456.5129
AIC = 470.5129
A note is printed reminding that the logistic model has been used to estimate parameters, followed by parameter estimates (coefficients) and their associated variance-covariance matrix. Parameter names correspond to the respective columns in the design matrix, and should be interpreted as for a logistic regression analysis.
Model has been fit using the logistic link.
Coefficients for occupancy covariates:

A1	1.3794
Coefficients for colonization covariates:
Coefficients for local extinction covariates:
		C1	-1.9809
Coefficients for detection covariates:
		D1	-1.7147
		D2	1.0969
		D3	0.7925
		D4	2.2332
		D5	3.7522

Variance-Covariance Matrix

A1	C1	D1	D2	D3	D4	D5	
0.1157	-0.1242	-0.0277	-0.0062	-0.0039	-0.0199	0.0239	
-0.1242	0.2219	0.0259	0.0036	0.0039	0.0445	-0.0230
-0.0277	0.0259	0.2146	-0.2016	-0.2051	-0.2032	-0.2137
-0.0062	0.0036	-0.2016	0.2784	0.2094	0.2067	0.2019	
-0.0039	0.0039	-0.2051	0.2094	0.3322	0.2141	0.2053	
-0.0199	0.0445	-0.2032	0.2067	0.2141	0.3157	0.2045	
0.0239	-0.0230	-0.2137	0.2019	0.2053	0.2045	0.3067


Single Season Simulation

This simple simulation routine is included so that users may get a general feel for how the model of MacKenzie et al. (2002)1 performs under a specific set of circumstances and sampling designs. Scenarios may either be entered from a tab-delimited ASCII text file (see below for details), or by entering the scenario directly.

Where the previously described simulation procedure is intended as a basic learning tool, this procedure is designed to address a specific question.

There are two general sampling designs that can be investigated; sampling only a subset of sites more intensively to estimate detection probabilities; or halting the repeated sampling of sites after the species is first detected. Both designs are compatible with the MacKenzie et al. (2002)1 model. A single-group model with constant p is fit to the simulated data. Results are written to a file named 'presence.out' and loaded into the Notepad editor when completed.

This simulation file should be set up as follows (see SimExample.txt).

The first line should consist of 4 integer values;

The next N lines of the file hold the true occupancy and detection probabilities for each site. The first column in each line is the occupancy probability, and the following T columns contain the probability of detecting the species (given presence) during each survey.

The first NI of these lines represent the sites that will be sampled more intensively. For the remaining sites, if T0 < T then PRESENCE is still expecting to read in T detection probabilities, however these will not be used.

The final line of the file consists of 3 integer values;

SimExample.txt - sample simulation input file

20	10	5	3		
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
0.8	0.2	0.4	0.3	0.2	0.6
1	0	500			

Tools and Settings

Help menu

Problems/Questions

The most frequent question/problem that occurs is about a message in the output warning that the optimization routine has possibly not reached the maximum likelihood value. The program then prints the number of significant digits achived when the optimization stopped. This situation occurs when the optimization routine is attempting to find the maximum of a function, when the function is relatively 'flat' (thinking in two-dimensions). If you look at the logit transformation described in this help file, you'll see that this can occur if a parameter is near zero or one. When a parameter is near zero, the 'untransformed' or 'beta' parameter associated with it is a large negative number. To get the 'real' parameter, the untransformed parameter is plugged into the logit equation (exp(x)/(1+exp(x))). So, plugging in -30 for x in this equation gives almost the same value as -40. This causes the optimization routine to think the function is 'flat' and gives the warning message.

Examining many simulated and real datasets, we have found that this message can be safely ignored if the number of significant digits is 3 or larger. The number of significant digits it reports is not the number of digits you can trust in the parmeter estimates. We have found that even when it reports 2 significant digits, the esitmates are accurate to 4 or more decimal places.

When the message occurs with the number of significant digits less than two, it usually indicates insufficient data for the desired model, or model overparameterization. This is sometimes accompanied by a warning about the variance-covariance matrix. If this happens, the model may need to be simplified.

In some cases, poor starting values for the parameters can cause the problems noted above. This can be solved by giving better initial values to the program when running the model. For example, if detection probabilities are very small, and the default starting values of 0.5 are far away from the final expected parmaeter values, the optimization routine may fail. The solution would be to input small initial values (on the logit scale) for the model so the optimization routine does not have to search very far. Since simpler models converge more readily than complex ones, it is usually best to start with simple models, so you have starting values for complex ones if needed.


Resources

Credits/Acknowledgments

PRESENCE was developed by Darryl MacKenzie of Proteus Research & Consulting Ltd. under contract to U.S. Geological Survey as part of their Amphibian Research and Monitoring Initiative.

Version 2 of PRESENCE was developed by Jim Hines of the U.S. Geological Survey.

Currently, We don’t know of any bugs in PRESENCE, although that doesn’t mean there aren’t any (yes, detection probability is less than 1.0!). If you find some, feel free to let us know.

Jim Hines  jhines@usgs.gov
Darryl MacKenzie  darryl@proteus.co.nz

Appendix

Covariates

Program PRESENCE makes the distinction between two types of covariates. Site-specific covariates are covariates that are constant for a site within a season. Examples would be habitat type, patch size, distance to nearest patch, or generalized weather patterns such as drought or El Niño years. Sampling-occasion covariates are covariates that may change with each survey of a site, for example local environmental conditions such as temperature, precipitation or cloud cover; time of day; or observer. Covariates are entered into the models using the logistic model.

Detection probabilities may be functions of either site-specific or sampling occasion covariates, while all other parameters may be functions of site-specific covariates only.

Sampling-occasion covariates may be missing, and are assumed to correspond to a missing detection/nondetection observation. When a covariate is being used that has missing values that do not correspond with a missing detection/nondetection observation, the detection/nondetection data is also treated as missing. Site-specific covariates can not have missing values, unless the site was never surveyed during that season.

An important note about continuous covariates! Because of the way the logit-link works, if the average value of a covariate is a long way from zero, then PRESENCE may not be able to find the true maximum likelihood estimates of the model parameters, which will give you bogus results. An indication that there might be a problem is that the estimates themselves look suspicious, the variance-covariance matrix might include a huge value, and/or you get a warning about a non-invertible variance-covariance matrix. The best approach is to transform your data onto another scale which is still meaningful to you. You could divide the covariate values by some constant (i.e., rather than entering 80% humidity as 80.0, use 0.80); subtract the average of the covariates from each observed value (i.e., X* = X – average(X’s)); or some combination of the two. Such transformations are not carried out by PRESENCE automatically, but can be done easily with a spreadsheet and the modified values pasted back into the Data Window.

Logistic Model or Logit Link

The logistic model can be used to investigate potential relationships between probabilities (the response) and covariates (the explanatory variables), as it ensures response values stay between 0 and 1. The logistic model is defined as;

loge(y/(1-y)) = Xβ,

where y is the probability; X is a row vector containing the covariate values; and β is a column vector of coefficient values that are to be estimated. An alternative definition for the model is, y = exp(Xβ) / (1+exp(Xβ)).

Large positive values for Xβ make y tend to 1, while large negative values make y tend to 0. If Xβ = 0, then y = 0.5.


Literature Cited

1MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. A. Royle and C. A. Langtimm. 2002. Estimating site occupancy rates when detection probabilities are less than one. Ecology 83(8): 2248-2255.

2MacKenzie, D. I., J. D. Nichols, J. E. Hines, M. G. Knutson and A. B. Franklin. Estimating site occupancy, colonization and local extinction when a species is detected imperfectly. Ecology 84(8): 2200-2207.

3Pledger, S. 2000. Unified maximum likelihood estimates for closed capture-recapture models using mixtures. Biometrics 56: 434-442.

4Burnham, K. P. and D. R. Anderson. 1998. Model selection and inference. Springer-Verlag, New York, USA

5MacKenzie, D. I., ???. Estimating .... (???)

6Royle, J.A. 2004. N-Mixture Models for Estimating Population Size from Spatially Replicated Counts. Biometrics 60, 108-115.

7Royle, J.A., and J.D. Nichols. 2003. Estimating Abundance from Repeated Presence-Absence Data or Point Counts. Ecology 84(3):777-790.

8MacKenzie, D. I., J. D. Nichols, J. A. Royle, J.A., K. Pollock, L. Bailey and J. E. Hines. 2006. Occupancy Estimation and Modeling - Inferring Patterns and Dynamics of Species Occurrence. Elsevier Publishing.