Gray, BR, RJ Haro and JT Rogala. 2010. Addressing among-group variation in covariate effects using multilevel models. Environmental and Ecological Statistics 17: 573–591.

Gray BR. 2012. Variance components estimation for continuous and discrete data, with emphasis on cross-classified sampling designs. In Gitzen RA, JJ Millspaugh, AB Cooper and DS Licht (eds.), Design and analysis of long-term ecological monitoring studies. Cambridge University Press, Cambridge, pp. 200-227.

**SAS code for fitting zero-inflated binomial / site occupancy models**

This site provides SAS code for fitting zero-inflated binomial (ZIB) / site occupancy models. ZIB models may also be fitted using R (unmarked link), MARK and PRESENCE. Code is provided to fit models where:

- detection probabilities (p) are either constant or vary randomly by site
- p varies as logit-normal or beta-binomial random variables
- inferences are under frequentist (via maximum likelihood (ML)) or Bayesian (via Markov chain Monte Carol (MCMC)) assumptions.

A note on my use of the terms “site occupancy” and “zero-inflated binomial.” The term “site occupancy” has a subject-matter connotation and so may be meaningless to statisticians not familiar with quantitative wildlife ecology. Otoh, the term “zero-inflated binomial” is less commonly used among ecologists (the subject-matter field in which these models see greatest use) and also has the sense of being marginal in the site occupancy / Bernoulli parameter (the latter is a concern because, under MCMC, the probability of each site being occupied may be estimated). Since neither term is clearly best, since I believe many ecologists can parse the “zero-inflated binomial” term and since “ZIB” is terrific shorthand, I’ve used “zero-inflated binomial” or “ZIB” throughout.

The code supplied is generally sparse (the user will probably want to add options), and notation is not consistent across sets of code. Also, I don’t demonstrate the use of fixed covariates. However, fixed covariates may be added to models associated with any of the code supplied.

Please send comments and suggestions to Brian Gray . Contributions of code will be especially welcomed, and code contributions that are posted will be acknowledged.

__ZIB model, p constant or varies by fixed covariates (the “standard” site occupancy model)__

__Maximum likelihood using PROC FMM__

proc fmm data=datasetname seed=12345;

model y / n = / dist=binomial;

model + / dist=constant;

run;

__Maximum likelihood using PROC NLMIXED__

proc nlmixed data=datasetname;

parms bpsi_0 bp_0 0;

eta_psi = bpsi_0;

psi = 1 / (1 + exp(-eta_psi));

eta_p = bp_0;

p = 1 / (1 + exp(-eta_p));

y=detects;

m=surveys;

if y = 0 then prob_0 = (1-psi) + psi*((1-p)**m);

if y = 0 then loglike = log(prob_0);

else loglike = log(psi) /*+ log(comb(m,y))*/ + y*log(p) + (m-y)*log(1-p);

model y ~ general(loglike);

estimate "psi" 1/(1+exp(-bpsi_0));

estimate "p" 1/(1+exp(-bp_0));

run;

__MCMC using PROC FMM__

proc fmm data=zibdata seed=12345;

model y / n = / dist=binomial;

model + / dist=constant;

bayes;

run;

__MCMC using PROC MCMC__

proc mcmc data=datasetname outpost=zibout dic seed=1 monitor=(_parms_);

parms b0psi_ b0p;

prior b0psi_ b0p ~ n(0, prec=0.1);

beginnodata;

psi_ = 1/(1+exp(-b0psi_));

p = 1/(1+exp(-b0p));

endnodata;

random zz ~ bern(psi_) subject=i monitor=(zz_1-zz_5);

eff_p = zz * p;

model y ~ binomial(n, eff_p);

run;

__Site occupancy model where the binomial parameter varies as a beta random variable__

__Maximum likelihood__

A number of formulations of the zero-inflated beta-binomial (ZIBB) model are conceivable. None have worked consistently for me under ML (at least as I’ve coded them in SAS).

Here’s an adaptation of beta binomial code supplied by Nelson et al. (2006):

proc nlmixed data=datasetname method=GAUSS NOAD fd qpoints=30;

parms bpsi0 bp0 0 rho=.6;

bounds 0 < rho < 1;

n=_freq_; * n = surveys per site;

psi = exp(bpsi0) / (1 + exp(bpsi0)) ;

pi = exp(bp0) / (1 + exp(bp0)) ;

alpha = pi*(1-rho)/rho ;

beta = (1- pi)*(1-rho)/rho ;

random a_i ~ normal(0,1) subject=sitecd;

prob = CDF('NORMAL',a_i) ;

p_i = quantile('beta',prob,alpha,beta);

if y = 0 then like = (1-psi) + psi*((1-p_i)**n);

else like = psi*(p_i**y) * ((1-p_i)**(n-y));

if like > 1e-8 then loglike = log(like);

else loglike=-1e20;

model y ~ general(loglike);

estimate "psi^" psi;

estimate "meanp^" pi;

estimate "alpha^" pi*(1-rho)/rho;

estimate "beta^" (1-pi)*(1-rho)/rho;

estimate "varp^(1)" alpha*beta/((alpha+beta)**2 * (alpha+beta+1)); * from Mark, above;

estimate 'var_p^(2)' pi*(1-pi)*rho; * from Royle 2006 (need more definitive reference);

run;

Contact me directly for code for fitting ZIBB models using the method described by Morgan et al. (2007) or a zero-inflated elaboration of Nelson et al’s numerically integrated beta-binomial marginal likelihood.

__MCMC__

Thanks to Fang Chen, SAS, for helping develop this code. Requires SAS 9.3. Mixing may not be great.

proc mcmc data=datasetname outpost=zibbout seed=1 monitor=(_parms_ mu_p alpha_ beta_) dic;

parms psi_ b00p b0rho_p;

prior b00p b0rho_p ~ n(0, prec=0.5);

prior psi_ ~ uniform(0, 1);

beginnodata;

mu_p = 1/(1+exp(-b00p));

rho_p= 1/(1+exp(-b0rho_p));

alpha_=mu_p*(1-rho_p)/rho_p;

beta_=(1-rho_p)*(1-mu_p) / rho_p;

var_p = alpha_*beta_/((alpha_+beta_)**2 * (alpha_+beta_+1));

endnodata;

random p ~ beta(alpha_, beta_) subject=site;* monitor=(p_3-p_5);

random z_ ~ bern(psi_) subject=site;* monitor=(z__3-z__5);

eff_p = z_ * p;

model y ~ binomial(subsite, eff_p);

run;

__Site occupancy model where the binomial parameter varies as a logit-normal random variable__

__Maximum likelihood__

Watch for boundary solutions (psi and/or mean p = 0 or 1) under this model.

proc nlmixed data=datasetname;

parms bpsi0 bp0 0 taup 1;

eta_psi = bpsi0;

psi = exp(eta_psi)/(1 + exp(eta_psi));

eta_p = bp0 + u0;

p = exp(eta_p)/(1 + exp(eta_p));

random u0 ~ N(0,taup**2) subject = group out=reffects;

m = _FREQ_; * define number surveys per site;

if y = 0 then loglike = log((1-psi) + psi*(1-p)**m); * see Johnson et al 1993, p315;

else loglike = log(psi) + /*log(comb(m,y)) +*/ y*log(p) + (m-y)*log(1-p); * add remarked text for full likelihood;

model y ~ general(loglike);

estimate "psi^" 1/(1+exp(-bpsi0));

estimate "median p^" 1/(1+exp(-bp0));

* estimate marginal or mean slope for zero process (after Johnson & Kotz 1970, p6, via Zeger, Liang and Albert 1988, p1054);

c = 16*sqrt(3)/(15*constant('pi'));

al_D = (1+(c**2)*taup**2)**(-.5);

estimate "mean p^" 1/(1+exp(-al_D*bp0));

estimate "varp^" taup**2;

run;

__MCMC__

Thanks to Fang Chen, SAS, for helping develop this code. Requires SAS 9.3 and, if the slice sampler option is used, R12.1.

proc mcmc data=datasetname monitor=(_parms_ psi_) nmc=10000 outpost=zibout seed=1 dic;

parms b0psi b00p_ /slice;

parms tau;

begincnst;

m = 67; * number sites;

endcnst;

beginnodata;

prior b0psi b00p_ ~ n(0, prec=0.5);

prior tau ~ gamma(1, iscale=2);

psi_ = 1/(1+exp(-b0psi));

endnodata;

random b0p ~ normal(b00p_, prec=tau) subject=site monitor=(random(5));

random z_ ~ bern(psi_) subject=site monitor=(random(5));

p = 1/(1+exp(-b00p_));

eff_p = z_ * p;

model y ~ binomial(subsite, eff_p);

run;

__Royle-Nichols model__

This code was initially developed by Julia Molony and Mark Holland, with subsequent modifications by Jill Tao, SAS, and me. The algorithm needs work and contributions will be appreciated. Download code with example.

__Dynamic site occupancy models__

I haven’t seen SAS code for fitting dynamic site occupancy models.

__References__

Nelson KP, SR Lipsitz, GM Fitzmaurice, J Ibrahim, M Parzen and R Strawderman. 2006. Use of the probability integral transformation to fit nonlinear mixed-effects models with nonnormal random effects. J Computational Graphical Statistics 15: 39–57.

Morgan BJT, DJ Revell and SN Freeman. 2007. A note on simplifying likelihoods for site occupancy models. Biometrics 63: 618–621.

** Disclaimer: ** Although these programs have been used by the U.S. Geological Survey (USGS), no warranty, expressed or implied, is made by the USGS or the U.S. Government as to the accuracy and functioning of the programs and related program material nor shall the fact of distribution constitute any such warranty, and no responsibility is assumed by the USGS in connection therewith.

Gray, BR, RJ Haro and JT Rogala. 2010. Addressing among-group variation in covariate effects using multilevel models. Environmental and Ecological Statistics 17: 573–591.

Gray BR. 2012. Variance components estimation for continuous and discrete data, with emphasis on cross-classified sampling designs. In Gitzen RA, JJ Millspaugh, AB Cooper and DS Licht (eds.), Design and analysis of long-term ecological monitoring studies. Cambridge University Press, Cambridge, pp. 200-227.

**SAS code for fitting zero-inflated binomial / site occupancy models**

This site provides SAS code for fitting zero-inflated binomial (ZIB) / site occupancy models. ZIB models may also be fitted using R (unmarked link), MARK and PRESENCE. Code is provided to fit models where:

- detection probabilities (p) are either constant or vary randomly by site
- p varies as logit-normal or beta-binomial random variables
- inferences are under frequentist (via maximum likelihood (ML)) or Bayesian (via Markov chain Monte Carol (MCMC)) assumptions.

A note on my use of the terms “site occupancy” and “zero-inflated binomial.” The term “site occupancy” has a subject-matter connotation and so may be meaningless to statisticians not familiar with quantitative wildlife ecology. Otoh, the term “zero-inflated binomial” is less commonly used among ecologists (the subject-matter field in which these models see greatest use) and also has the sense of being marginal in the site occupancy / Bernoulli parameter (the latter is a concern because, under MCMC, the probability of each site being occupied may be estimated). Since neither term is clearly best, since I believe many ecologists can parse the “zero-inflated binomial” term and since “ZIB” is terrific shorthand, I’ve used “zero-inflated binomial” or “ZIB” throughout.

The code supplied is generally sparse (the user will probably want to add options), and notation is not consistent across sets of code. Also, I don’t demonstrate the use of fixed covariates. However, fixed covariates may be added to models associated with any of the code supplied.

Please send comments and suggestions to Brian Gray . Contributions of code will be especially welcomed, and code contributions that are posted will be acknowledged.

__ZIB model, p constant or varies by fixed covariates (the “standard” site occupancy model)__

__Maximum likelihood using PROC FMM__

proc fmm data=datasetname seed=12345;

model y / n = / dist=binomial;

model + / dist=constant;

run;

__Maximum likelihood using PROC NLMIXED__

proc nlmixed data=datasetname;

parms bpsi_0 bp_0 0;

eta_psi = bpsi_0;

psi = 1 / (1 + exp(-eta_psi));

eta_p = bp_0;

p = 1 / (1 + exp(-eta_p));

y=detects;

m=surveys;

if y = 0 then prob_0 = (1-psi) + psi*((1-p)**m);

if y = 0 then loglike = log(prob_0);

else loglike = log(psi) /*+ log(comb(m,y))*/ + y*log(p) + (m-y)*log(1-p);

model y ~ general(loglike);

estimate "psi" 1/(1+exp(-bpsi_0));

estimate "p" 1/(1+exp(-bp_0));

run;

__MCMC using PROC FMM__

proc fmm data=zibdata seed=12345;

model y / n = / dist=binomial;

model + / dist=constant;

bayes;

run;

__MCMC using PROC MCMC__

proc mcmc data=datasetname outpost=zibout dic seed=1 monitor=(_parms_);

parms b0psi_ b0p;

prior b0psi_ b0p ~ n(0, prec=0.1);

beginnodata;

psi_ = 1/(1+exp(-b0psi_));

p = 1/(1+exp(-b0p));

endnodata;

random zz ~ bern(psi_) subject=i monitor=(zz_1-zz_5);

eff_p = zz * p;

model y ~ binomial(n, eff_p);

run;

__Site occupancy model where the binomial parameter varies as a beta random variable__

__Maximum likelihood__

A number of formulations of the zero-inflated beta-binomial (ZIBB) model are conceivable. None have worked consistently for me under ML (at least as I’ve coded them in SAS).

Here’s an adaptation of beta binomial code supplied by Nelson et al. (2006):

proc nlmixed data=datasetname method=GAUSS NOAD fd qpoints=30;

parms bpsi0 bp0 0 rho=.6;

bounds 0 < rho < 1;

n=_freq_; * n = surveys per site;

psi = exp(bpsi0) / (1 + exp(bpsi0)) ;

pi = exp(bp0) / (1 + exp(bp0)) ;

alpha = pi*(1-rho)/rho ;

beta = (1- pi)*(1-rho)/rho ;

random a_i ~ normal(0,1) subject=sitecd;

prob = CDF('NORMAL',a_i) ;

p_i = quantile('beta',prob,alpha,beta);

if y = 0 then like = (1-psi) + psi*((1-p_i)**n);

else like = psi*(p_i**y) * ((1-p_i)**(n-y));

if like > 1e-8 then loglike = log(like);

else loglike=-1e20;

model y ~ general(loglike);

estimate "psi^" psi;

estimate "meanp^" pi;

estimate "alpha^" pi*(1-rho)/rho;

estimate "beta^" (1-pi)*(1-rho)/rho;

estimate "varp^(1)" alpha*beta/((alpha+beta)**2 * (alpha+beta+1)); * from Mark, above;

estimate 'var_p^(2)' pi*(1-pi)*rho; * from Royle 2006 (need more definitive reference);

run;

Contact me directly for code for fitting ZIBB models using the method described by Morgan et al. (2007) or a zero-inflated elaboration of Nelson et al’s numerically integrated beta-binomial marginal likelihood.

__MCMC__

Thanks to Fang Chen, SAS, for helping develop this code. Requires SAS 9.3. Mixing may not be great.

proc mcmc data=datasetname outpost=zibbout seed=1 monitor=(_parms_ mu_p alpha_ beta_) dic;

parms psi_ b00p b0rho_p;

prior b00p b0rho_p ~ n(0, prec=0.5);

prior psi_ ~ uniform(0, 1);

beginnodata;

mu_p = 1/(1+exp(-b00p));

rho_p= 1/(1+exp(-b0rho_p));

alpha_=mu_p*(1-rho_p)/rho_p;

beta_=(1-rho_p)*(1-mu_p) / rho_p;

var_p = alpha_*beta_/((alpha_+beta_)**2 * (alpha_+beta_+1));

endnodata;

random p ~ beta(alpha_, beta_) subject=site;* monitor=(p_3-p_5);

random z_ ~ bern(psi_) subject=site;* monitor=(z__3-z__5);

eff_p = z_ * p;

model y ~ binomial(subsite, eff_p);

run;

__Site occupancy model where the binomial parameter varies as a logit-normal random variable__

__Maximum likelihood__

Watch for boundary solutions (psi and/or mean p = 0 or 1) under this model.

proc nlmixed data=datasetname;

parms bpsi0 bp0 0 taup 1;

eta_psi = bpsi0;

psi = exp(eta_psi)/(1 + exp(eta_psi));

eta_p = bp0 + u0;

p = exp(eta_p)/(1 + exp(eta_p));

random u0 ~ N(0,taup**2) subject = group out=reffects;

m = _FREQ_; * define number surveys per site;

if y = 0 then loglike = log((1-psi) + psi*(1-p)**m); * see Johnson et al 1993, p315;

else loglike = log(psi) + /*log(comb(m,y)) +*/ y*log(p) + (m-y)*log(1-p); * add remarked text for full likelihood;

model y ~ general(loglike);

estimate "psi^" 1/(1+exp(-bpsi0));

estimate "median p^" 1/(1+exp(-bp0));

* estimate marginal or mean slope for zero process (after Johnson & Kotz 1970, p6, via Zeger, Liang and Albert 1988, p1054);

c = 16*sqrt(3)/(15*constant('pi'));

al_D = (1+(c**2)*taup**2)**(-.5);

estimate "mean p^" 1/(1+exp(-al_D*bp0));

estimate "varp^" taup**2;

run;

__MCMC__

Thanks to Fang Chen, SAS, for helping develop this code. Requires SAS 9.3 and, if the slice sampler option is used, R12.1.

proc mcmc data=datasetname monitor=(_parms_ psi_) nmc=10000 outpost=zibout seed=1 dic;

parms b0psi b00p_ /slice;

parms tau;

begincnst;

m = 67; * number sites;

endcnst;

beginnodata;

prior b0psi b00p_ ~ n(0, prec=0.5);

prior tau ~ gamma(1, iscale=2);

psi_ = 1/(1+exp(-b0psi));

endnodata;

random b0p ~ normal(b00p_, prec=tau) subject=site monitor=(random(5));

random z_ ~ bern(psi_) subject=site monitor=(random(5));

p = 1/(1+exp(-b00p_));

eff_p = z_ * p;

model y ~ binomial(subsite, eff_p);

run;

__Royle-Nichols model__

This code was initially developed by Julia Molony and Mark Holland, with subsequent modifications by Jill Tao, SAS, and me. The algorithm needs work and contributions will be appreciated. Download code with example.

__Dynamic site occupancy models__

I haven’t seen SAS code for fitting dynamic site occupancy models.

__References__

Nelson KP, SR Lipsitz, GM Fitzmaurice, J Ibrahim, M Parzen and R Strawderman. 2006. Use of the probability integral transformation to fit nonlinear mixed-effects models with nonnormal random effects. J Computational Graphical Statistics 15: 39–57.

Morgan BJT, DJ Revell and SN Freeman. 2007. A note on simplifying likelihoods for site occupancy models. Biometrics 63: 618–621.

** Disclaimer: ** Although these programs have been used by the U.S. Geological Survey (USGS), no warranty, expressed or implied, is made by the USGS or the U.S. Government as to the accuracy and functioning of the programs and related program material nor shall the fact of distribution constitute any such warranty, and no responsibility is assumed by the USGS in connection therewith.