Beyond the exponential family

Eric Pedersen, Gavin Simpson, David Miller, Noam Ross
August 5th, 2017

Away from the exponential family

Most glm families (Poisson, Gamma, Gaussian, Binomial) are exponential families

f(x|θ)exp(iηi(θ)Ti(x)A(θ))

  • Computationally easy
  • Has sufficient statistics: easier to estimate parameter variance
  • … but it doesn't describe everything
  • mgcv has expanded to cover many new families
  • Lets you model a much wider range of scenarios with smooths

What we'll cover

  • “Counts”: Negative binomial and Tweedie distributions
  • Modelling proportions with the Beta distribution
  • Robust regression with the Student's t distribution
  • Ordered and unorderd categorical data
  • Multivariate normal data
  • Modelling exta zeros with zero-inflated and adjusted families

  • NOTE: All the distributions we're covering here have their own quirks. Read the help files carefully before using them!

Modelling "counts"

Gaussian location-scale models (family = gaulss)

  • Model both the mean (“location”) and variance (“scale”) as smooth functions of predictors
  • Example uses: detecting early warning signs in time series, finding factors driving population variability
  • mgcv code: formula = list(y~s(x1)+s(x2), ~s(x2)+s(x3)), family=gaulss

Zero-inflated Poisson location-scale models (family = ziplss)

  • Models the probability of zeros seperately from mean counts given that you've observed more than zero at a location.
  • Example uses: Counts of prey caught when a predator might switch between not hunting at all (zeros) and active hunting
  • mgcv code: formula = list(y~s(x1)+s(x2), ~s(x2)+s(x3)), family=ziplss

The end of the distribution zoo

That's the end of this section! We convene after lunch (1:00 PM). You'll get to work through a few more advanced examples of your choice.