evoTS: Analyses of evolutionary time-series
Kjetil L. Voje
2024-06-28
Source:vignettes/evoTS_vignette.Rmd
evoTS_vignette.Rmd
1.0 About evoTS
The evoTS
package facilitates univariate and
multivariate analyses of phenotypic change within lineages.
The evoTS
package extends the univariate modeling
framework implemented in the
paleoTS
package (Hunt 2006; 2008a; 2008b; Hunt et al. 2008; 2010;
2015) and has been developed to mirror the user experience from
paleoTS
as much as possible. For example, all univariate
models implemented in evoTS
are fitted to a
paleoTS
object, i.e., the data format used in
paleoTS
. The fit of all univariate models available in
paleoTS
and evoTS
are directly comparable.
evoTS
contains a range of multivariate models, including
different versions of multivariate unbiased random walks and
Ornstein-Uhlenbeck processes. Together, these models allow the user to
test various hypotheses of trait evolution, e.g., whether traits change
in a correlated or uncorrelated manner, whether one trait/variable
affects the optimum of a second trait (Granger causality), whether
adaptation in different traits happen independently toward fixed optima
etc.
evoTS
also contains functions for calculating the
topology of the likelihood surfaces of fitted models, a useful feature
to investigate the range of parameter values with approximately equal
likelihood as the best parameter estimates.
1.1 Compatibility with paleoTS
(technical
info)
The models implemented in evoTS
build on the same
assumptions as the models available in the paleoTS
package:
all models assume the population (sample) means in the sequence of
ancestor-descendants have a joint distribution that is multivariate
normal with an expected mean vector and covariance matrix that are
functions of the parameters of each model, the time intervals separating
the populations (samples) in the sequence, and the sampling variances of
the trait means. Given the assumption of multivariate normality, the
expected distribution of sample means is given by their first, second,
and mixed moments (covariance). As in the paleoTS
package,
evoTS
use the built-in optimization routines in R for
estimating maximum likelihood parameter estimates. The default
hill-climbing optimization technique used in all univariate models
(L-BFGS-B) is a quasi-Newton method that constrains the optimization of
certain parameters (e.g., so that variance parameters cannot be smaller
than 0). All multivariate models use the Nelder-Mead hill climbing
search algorithm as default.
All models in evoTS
have been implemented using the
joint parameterization routine from the paleoTS
package.
The optimization is therefore fit using the actual sample values, with
the autocorrelation among samples accounted for in the log-likelihood
function.
In evoTS
, as in the paleoTS
package, the
default setting when fitting models is that the sample variance is
pooled across all samples/populations. Pooling sample variance makes
sense when the sample size is low for some or many of the
samples/populations in a time series. Pooling sample variance allows for
estimating a more precise sample/population variance common for all
samples/populations in stead of relying on each separate estimate of a
sample/population variance. There may be circumstances when estimating a
pooled sample/population variance is not necessary (e.g., if sample size
is high for all samples/populations) or beneficial (the hypothesis being
tested demands individually estimated sample variances). Whether sample
variances are pooled or not is controlled by setting
pool = TRUE
or FALSE
when fitting models in
evoTS.
Relative model fit in evoTS
is evaluated based on the
small sample-corrected version of the AICc (Akaike 1974; Burnham and
Anderson 2002).
2.0 Installation
The evoTS
package is available on CRAN and GitHub.
## Installing from GitHub
> install.packages("devtools")
> devtools::install_github("klvoje/evoTS")
> library(evoTS)
3.0 Getting data into evoTS
An object of class paleoTS
is the required input for
most of the functions in evoTS
. To create a
paleoTS
object, you need vectors of trait means,
sample/population variances, sample sizes and ages of the
samples/populations.
One easy way to create a paleoTS
object is to use the
as.paleoTS
function from the paleoTS
package.
## Creating example data
> trait_means<-rnorm(20)
> trait_variance<-rep(0.5,20)
> sample_size<-rep(30,20)
> time_vector<-seq(0,19,1)
# Create paleoTS object
> indata.evoTS<-paleoTS::as.paleoTS(mm = trait_means, vv = trait_variance, nn = sample_size, tt = time_vector)
Another way to get data into evoTS
is to use the
function read.paleoTS
from the paleoTS
package. This function imports data from a text file with four columns
corresponding to sample sizes, trait means, sample/population variances,
and ages of the samples/populations (in that order) and converts the
input to a paleoTS
object.
See also the paleoTS
(vignette)
for more info on how to import data and create a paleoTS
object.
3.1 Data included in evoTS
Two evolutionary sequences (time-series) of phenotypic change are
included in the evoTS
package. The data are from the diatom
lineage Stephanodiscus yellowstonensis and were originally
published in Theriot et al. (2006). Each trait consists of 63
samples spanning almost 14 000 years of phenotypic change.
We will use these data to illustrate many of the functions
implemented in evoTS.
We first investigate phenotypic change in the diameter of S.
yellowstonensis. The diameter has been measured in micrometers, but
we are interested in investigating proportional changes in the trait. We
therefore first do an approximate log-transformation of the data. We
then convert the time vector in the data set to unit length (i.e., the
length in time from the oldest to youngest sample/population in the data
set becomes 1). Such a linear transformation of the time vector does not
change how the estimated parameters describe the evolutionary dynamics
in the data, but ease parameter estimation and interpretation of certain
model parameters. We also plot the data to have a look at the phenotypic
changes.
## Doing an approximate log-transformation of the data
> ln.diameter<-paleoTS::ln.paleoTS(diameter_S.yellowstonensis)
## Convert the time vector to unit length
> ln.diameter$tt<-ln.diameter$tt/(max(ln.diameter$tt))
## Plotting the data
> plotevoTS(ln.diameter)
4.0 Univariate models in evoTS
The evoTS
package contains a range of univariate models
that expand and supplement the models available in
paleoTS.
The paleoTS
package contains functions to fit biased
(GRW) and unbiased random walks (URW), stasis (modeled as a white noise
process, i.e., uncorrelated variation around a constant mean), strict
stasis (no real evolutionary change) and an Ornstein-Uhlenbeck (OU)
processes assuming a fixed optimum (see Hunt 2006 and Hunt et
al. 2008 for info on these models). The paleoTS
package also contains models of a punctuated mode of evolution where
punctuations (jumps in phenotype space) separate different
parameterizations of the stasis model (Hunt 2008). A few other mode
shift models (where the model of evolution shifts at some point during
the evolutionary sequence) has also been implemented in
paleoTS
(Hunt 2008; Hunt et al. 2015).
The following univariate models have been implemented in the
evoTS
package:
A decelerated-evolution model (an unbiased random walk with an exponential decrease in the rate of change over time)
A accelerated-evolution model (an unbiased random walk with an exponential increase in the rate of change over time)
Ornstein-Uhlenbeck processes where the optimum changes according to an unbiased random walk.
4.1 Decelerated-evolution model
The unbiased random walk model in the paleoTS
package
model evolutionary changes as random draws from a normal distribution
with a mean of zero (Hunt 2006). Each draw from the normal distribution
represents a discrete evolutionary “step” and the variance of the normal
distribution is called the step variance. The decelerated model of
phyletic evolution is an unbiased random walk where the step variance is
reduced exponentially through time (Voje 2020). This model is closely
related to the early burst model developed for phylogenetic comparative
data (e.g., Harmon et al. 2010, Cooper and Purvis 2010), but
describes a reduced rate of evolution with time within a lineage and not
within a clade. As for the random walk model (Hunt 2006), the expected
evolutionary divergence between ancestor and descendant populations is
always zero in the model of decelerated evolution. The expected trait
mean and its variance and covariance are given by the following
expressions:
\[ E[z_{i}] = z_{0} \]
\[ Var[z_{i}] = \sigma ^{2}
_{step.0}\left[ \frac{ e^{rt_i} - 1}r \right] +\epsilon _{i} \]
\[ Cov[z_{i},z_{j}] = \sigma ^{2}
_{step.0}\left[ \frac{ e^{rt_{min}} - 1}r \right] \]
where \(z_{i}\) is the expected trait value for population i in the time series, \(z_{i}\) is the ancestral trait mean, \(\sigma ^{2} _{step.0}\) is the step distribution, \(r\) describes the exponential decay in the net rate of change through time and is constrained to be 0 or smaller, \(t_{i}\) is the time interval from the ancestral population mean (the start of the fossil sequence, which has a time of 0) to the ith population, and \(t_{min}\) is the time interval from the ancestral population to the oldest of the two populations \(z_{i}\) and \(z_{j}\).
The decerelated model of evolution can be fitted to a time series
using the opt.joint.decel
function.
> opt.joint.decel(ln.diameter)
paleoTSfit object [n = 63 , K = 3 ]
Model: Decel
Method: Joint
log-likelihood = 78.67217
AICc = -150.9376
Parameter estimates:
anc vstep r
3.7195253 0.4308633 -1.3114667
Additional elements not printed: convergence logLFunction
The output returns the log-likelihood of the model parameters,
the AICc score, the number (K) of estimated parameters, the
length of the analysed time-series (n), the model name and the
method used to parameterize the model (mMthod). anc is
the estimated ancestral trait value, vstep is the initial value
for the step distribution, and r describes the exponential
decay in the vstep parameter through time.
The time it takes to half the net rate of evolution can be calculated based on the value of r using \(-ln(2)/r\). The half-life parameter is interpreted based on the time-scale used when analyzing the data. Since time from start to end in our data has been scaled to unit length, the estimated half-life represent the percent of the total length of the time-series it takes for the rate of evolution to half. The half-life is (-ln(2)/-1.3114667 =) 0.53 in this example. The total length of the analyzed time-series is 13,728 years, which means it takes (13,728*0.53 =) 7,276 years for the net rate of evolution to be reduced by 50%.
What are the uncertainty of the estimated parameters in this model?
Standard errors of the parameters are returned by setting
hess = TRUE
when fitting a model. The standard errors are
calculated based on the Hessian matrix, which is a square matrix of
partial second order derivatives.
Another way to assess the uncertainty of the estimated parameters is to explore the likelihood-surface of the fitted model.
Investigating the likelihood surface can be helpful for several reasons.
Computing the likelihood surface is a great way to explore which parameter combinations that have an almost identical likelihood compared to the maximum likelihood values. Investigating the log-likelihood surface is also an approach to assess uncertainty in the estimated parameters. A large range of parameter values that have almost the same log-likelihood is an indication that we should be careful putting too much emphasis on only the maximum-likelihood (best) estimates of the parameters. The functions in
evoTS
calculating log-likelihood surfaces report the upper and lower parameter estimates that are within two support units of the best estimate as a way to assess uncertainty in parameters (Edwards 1992). While standard errors computed from the Hessian matrix are always symmetric around the point estimate, the log-likelihood surface might not be (multivariate) normal. The reported upper and lower parameter estimates are therefore often not symmetrical around the maximum likelihood parameter estimates.Estimating parameters in a model using maximum likelihood always run the risk of returning parameters from a local and not a global optimum in the likelihood landscape. Investigating the support surface for combinations of parameters is one way to explore the topology of the likelihood-surface.
Ridges in the log-likelihood surface can make it challenging to identify maximum likelihood estimates of the model parameters in certain cases. Flat ridges may for example cause identifiability issues and problems for the model to converge. Investigating the log-likelihood surface can therefore help diagnose challenges related to failures of models convergence.
The evoTS
package contain functions to create likelihood
surfaces for univariate models in evoTS
and
paleoTS
(e.g., loglik.surface.stasis
,
loglik.surface.URW
, loglik.surface.GRW
,
loglik.surface.OU
). These functions need a
paleoTS
object and vectors containing candidate values for
the parameters to be evaluated. Which candidate values to investigate is
trial-and-error, but the maximum likelihood estimate of the parameter
should always be in the interval.
For the decelerated model of evolution, the vectors given to the
arguments vstep.vec
and r.vec
define the
pairwise combinations of parameters for which the function will estimate
the log-likelihood. The resolution of the input vectors therefore
determines how accurate the visual representation of the support surface
is, including the returned upper and lower estimates printed in the
console. A higher resolution gives better precision, but demands more
computation time. Note that the computed support surface is conditional
on the best estimates of the other model parameters that are not part of
the support surface (e.g., the estimated ancestral trait value is
assumed to be 3.7195253 in the example below).
One way to define the candidate values is to use the seq
function.
> loglik.surface.decel(ln.diameter, vstep.vec = seq(0,1.3,0.01), r.vec = seq(-5,0,0.01))
lower upper
vstep 0.18 1.26
r -3.23 -0.21
From the likelihood surface and from the printed confidence
intervals, we see that r values between -3.23 and -0.21 are
within 2 log-likelihood units from the best estimate for this parameter.
This suggests we should be careful to exclude the possibility that the
half-life of the decay in the rate of evolution is as much as 330%
(45,312 years) or as low as 21% (2,946 years) of the investigated
time-interval.
4.2 Accelerated-evolution model
The accelerated evolution model is identical to the decelerated model except that the r parameter is constrained to be 0 or larger, which means the rate of evolution is accelerating with time.
The accelerated evolution model can be fitted using the
opt.joint.accel
function.
> opt.joint.accel(ln.diameter)
paleoTSfit object [n = 63 , K = 3 ]
Model: Accel
Method: Joint
log-likelihood = 77.57017
AICc = -148.7336
Parameter estimates:
anc vstep r
3.7104896 0.2387427 0.0000010
Additional elements not printed: convergence logLFunction
The accelerated evolution model has a lower (worse)
log-likelihood and higher (worse) AICc score compared to the decelerated
model of evolution.
A support surface can be produced using the
loglik.surface.accel
function.
> loglik.surface.accel(ln.diameter, vstep = seq(0,5,0.01), r.vec = seq(0,1.5, 0.005))
lower upper
vstep 0.090 5.00
r 0.035 1.35
The 3D plot can be rotated vertically and horizontally to get a
better overview of the likelihood surface, which is why the observation
angle is different for this 3D plot compared to the 3D plot for the
decelerated model.
4.4 Ornstein-Uhlenbeck model with moving optimum.
An Ornstein-Uhlenbeck model describes the evolution of a trait
towards an optimum. The paleoTS
package includes an
Ornstein-Uhlenbeck (OU) model of evolution with a single, fixed optimum
(Hunt et al. 2008), portraying evolutionary adaptation of a
trait towards a fixed peak on the adaptive landscape. However, peaks in
the adaptive landscape might not be fixed and the evoTS
package contains functions to fit OU models where the optimum (peak) is
constantly changing position according to an unbiased random walk. Such
a model was proposed by Hansen et al. (2008) for analyses of
phylogenetic comparative data. Adjusted to describe phenotypic evolution
within a single lineage, the expected trait mean and its variance and
covariance are given by the following expressions:
\[E[z_{i}] = e^{(-\alpha t_{i})}z_{0} +
(1-e^{-\alpha t_{i}})\theta\]
\[Var[z_{i}] =\left[ \frac{
\sigma^{2}_{step}+\sigma^{2}_{\theta}}{2\alpha} \right] \left(
1-e^{(-2\alpha t_{i})}\right) + \sigma^{2}_{\theta}t_{i}
\left[ 1-2(1-e^{-\alpha t_{i}}) /\alpha t_{i}\right] + \epsilon _{i}
\]
\[Cov[z_{i},z_{j}] =\left[
\frac{ \sigma^{2}_{step}+\sigma^{2}_{\theta}}{2\alpha} \right] \left(
1-e^{(-2\alpha t_{a})}\right)e^{-\alpha t_{ij}} +
\sigma^{2}_{\theta}t_{a} \left[ 1-\left(1+e^{-\alpha t_{ij}} \right)
\left( 1-e^{-\alpha t_{a}} \right) / \alpha t_{i}\right]\]
where \(z_{i}\) is the expected
trait value for the ith sample, \(z_{0}\) is the ancestral trait mean, \(t_{i}\) is the time interval from the
ancestral population mean (the start of the time-series, which has a
time of 0) to the ith sample, \(\theta\) is the optimum, \(\alpha\) measures the rate of adaptation to
the optimum, \(\sigma^{2}_{step}\) is
the variance of the stochastic perturbations of z, and \(\sigma^{2}_{\theta}\) is the variance of
the stochastic perturbations of the optimum, \(t_{a}\) is the time interval from the
ancestral population to the oldest of the two populations \(z_{i}\) and \(z_{j}\), and \(t_{ij}\) is the time separating two samples
\(z_{i}\) and \(z_{j}\). The estimation (sampling) error
\(\epsilon_{i}\) of the population
means contribute to the expected variance between two population
means.
The model can be fitted using the opt.joint.OUBM
function.
> opt.joint.OUBM(ln.diameter)
paleoTSfit object [n = 63 , K = 4 ]
Model: OU model with moving optimum (ancestral state at optimum)
Method: Joint
log-likelihood = 78.5667
AICc = -148.4437
Parameter estimates:
anc/theta.0 vstep.trait alpha vstep.opt
3.71050957 0.25577371 4.45009756 0.00000001
The vstep.opt parameter describes the rate of change
in the optimum. This is extremely small (virtually zero) in the example
above, which means the optimum is essentially fixed. The alpha in the OU
model represents the strength of the pull towards the optimum (Hansen
1997). A parameter that is easier to interpret compared to the alpha is
the half-life, \(ln(2)/ \alpha\), which
is the time it takes for the trait to move half-way from the ancestral
state to the optimum. The half life is therefore a quantification of the
speed of adaptation towards the optimal state. As for the decelerated
and accelerated models of evolution, the interpretation of the half life
depends on the time-interval covered by the time-series. Since the
time-interval of the time-series we analyze is scaled to unit length
(i.e., the time from the start to the end of the time-series is 1), this
means the half-life can be interpreted as the percent of the total
length of the time-series. The half-life in our example is \(ln(2)/ \alpha\) = 0.16. According to this
point estimate, it takes the trait 16% of the total length of the
time-series to evolve half-way towards the optimum, which is about
(13,728 years * 0.16 =) 2197 years.
Note that the name of the first reported parameter is
anc/theta.0. This parameter represents the ancestral
trait value, but also the value of the “ancestral” optimum. The default
option in the opt.joint.OUBM
function is to assume that the
trait was perfectly adapted at the start of the time-series (the
argument anc.opt = TRUE
), but this can be changed by
setting anc.opt = FALSE
> opt.joint.OUBM(ln.diameter, opt.anc = FALSE)
paleoTSfit object [n = 63 , K = 5 ]
Model: OU model with moving optimum
Method: Joint
log-likelihood = 80.71298
AICc = -150.3733
Parameter estimates:
anc vstep.trait theta.0 alpha vstep.opt
3.70316688 0.27295686 3.89044533 11.89309009 0.00000001
Setting opt.anc = FALSE
estimates a separate
“ancestral” value for the optimum (theta.0). The rate of change
in the optimum (vstep.opt
) is still negligible, which means
this model is virtually identical to a model where the optimum is fixed.
This can be shown by fitting an OU model where the optimum is fixed,
which is the model included in the paleoTS
package.
> paleoTS::opt.joint.OU(ln.diameter)
paleoTSfit object [n = 63 , K = 4 ]
Model: OU
Method: Joint
Convergence: Successful
log-likelihood = 80.71298
AICc = -152.7363
Parameter estimates:
anc vstep theta alpha
3.7031741 0.2729583 3.8904507 11.8938651
Additional elements not printed: convergence logLFunction
The fixed optimum model gives the same log-likelihood value as
the model where the optimum was allowed to change (but actually didn’t).
The fixed optimum model has a better AICc score as this model contains
one less parameter (the parameter describing the rate of change in the
optimum).
It is good practice to repeat any numerical optimization procedure from different starting points. This is especially important when the model has several parameters, as parameter-rich models may contain more than one peak in the log-likelihood surface. The OUBM model is a type of model that may have several local peaks in the likelihood space.
The user can choose the number of iterations of the numerical
optimization of the OUBM model using the argument
iterations.
The function will return the parameter values
from the run with the highest log-likelihood. The starting values in
each iteration are drawn from a normal distribution with mean zero and a
standard deviation set by the user (default is 1). The initial values
for the vstep and alpha parameters are constrained to
be equal or larger than 0.
Here, we run the opt.joint.OUBM
function (assuming the
trait value is perfectly adapted at the start of the sequence) from 100
different starting points (i.e., 100 different initial parameter
values):
> opt.joint.OUBM(ln.diameter, opt.anc = TRUE, iterations = 100)
The optimization method is executed from multiple different starting points. Number of iterations: 100
paleoTSfit object [n = 63 , K = 4 ]
Model: OU model with moving optimum (ancestral state at optimum)
Method: Joint
log-likelihood = 78.5667
AICc = -148.4437
Parameter estimates:
anc/theta.0 vstep.trait alpha vstep.opt
3.71050956 0.25577772 4.45011497 0.00000001
From the output, we see that the likelihood score of the best
model among the 100 model runs is identical to the score when we ran the
model without any iterations. However, the maximum likelihood parameter
estimates are slightly different (e.g., a difference in the sixth
decimal for the vstep parameter), but not to an extent that
changes our interpretation of the trait dynamics.
The evoTS
package contains functions to estimate
likelihood surfaces for the different versions of the OU models
(loglik.surface.OU
and loglik.surface.OUBM
).
In these functions, the likelihood surface is not estimated as a
function of the step variance and alpha parameter directly, but rather
as a function of two related parameters that are easier to give a
biological interpretation. The stationary variance, \(vstep/(2* \alpha)\), represents the
equilibrium variance of the OU process (Hansen et al. 2008) and
describes the variance expected in the trait after the trait has reached
the optimum. The half-life, \(log(2)/(\alpha)\), is the amount of time it
takes for the trait to move half-way from the ancestral state to the
optimum. The half-life is informative regarding the speed of adaptation
toward the optimal state. To get an idea for which candidate values to
investigate for the likelihood-surface, we first need to calculate the
maximum likelihood values of the stationary variance and half-life
parameters from the model output.
The OU model with a fixed optimum had the best relative model fit
according to AICc among the three versions of the OU model we
investigated. The maximum likelihood estimate of the half-life from this
OU model is \(log(2)/11.8941\) =
0.0583. The maximum likelihood estimate of the stationary variance is
\(0.2730/(2*11.8941)\) = 0.0115. But
these are only point-estimates. We can explore the support interval
around these point estimates of the half-life and the stationary
variance using the loglik.surface.OU
function.
> loglik.surface.OU(ln.diameter, stat.var.vec=seq(0,0.1,0.001), h.vec=seq(0,0.4,0.001))
lower upper
stationary variance 0.007 0.053
half-life 0.029 0.305
Half-life values up to 30% of the total length of the time-series are within two log-likelihood units from the best estimate. This indicates that substantially slower evolution than the point estimate of a 6% half-life cannot be ruled out as a possibility.
4.5 Fitting all univariate models in evoTS
and
paleoTS
A quick way to evaluate the relative fit of all univariate models in
the evoTS
and paleoTS
packages (excluding
models with mode shifts) is to use the fit.all.univariate
function.
> fit.all.univariate(ln.diameter, pool = TRUE)
Comparing 9 models [n = 63, method = Joint]
logL K AICc dAICc Akaike.wt
GRW 77.64073 3 -148.87469 3.861618 0.058
URW 77.57018 2 -150.94035 1.795953 0.162
Stasis 39.84019 2 -75.48039 77.255917 0.000
StrictStasis -707.46411 1 1416.99379 1569.730094 0.000
Decel 78.67217 3 -150.93756 1.798746 0.162
Accel 77.57017 3 -148.73357 4.002737 0.054
OU 80.71298 4 -152.73631 0.000000 0.397
OU model with moving optimum (ancestral state at optimum) 78.56670 4 -148.44374 4.292567 0.046
OU model with moving optimum 80.71298 5 -150.37333 2.362978 0.122
4.6 Fitting combinations of univariate models to a time series (mode shift)
There is no a priori reason why a lineage should follow only one mode
of evolution. The evoTS
package allows for investigating
all pairwise model combinations of the models stasis, unbiased random
walk (URW), trend (GRW) and an Ornstein-Uhlenbeck (OU) process with a
fixed optimum using the function fit.mode.shift
. It is
possible to either investigate specific shift points using the argument
shift.point
or investigate all possible shift points, like
below.
> fit.mode.shift(ln.diameter, model1 = "URW", model2 = "URW", minb = 10)
Comparing 9 models [n = 63, method = Joint]
logL K AICc dAICc Akaike.wt
GRW 77.64073 3 -148.87469 3.861618 0.058
URW 77.57018 2 -150.94035 1.795953 0.162
Stasis 39.84019 2 -75.48039 77.255917 0.000
StrictStasis -707.46411 1 1416.99379 1569.730094 0.000
Decel 78.67217 3 -150.93756 1.798746 0.162
Accel 77.57017 3 -148.73357 4.002737 0.054
OU 80.71298 4 -152.73631 0.000000 0.397
OU model with moving optimum (ancestral state at optimum) 78.56670 4 -148.44374 4.292567 0.046
OU model with moving optimum 80.71298 5 -150.37333 2.362978 0.122
> fit.mode.shift(ln.diameter, model1 = "URW", model2 = "URW", minb = 10)
[1] "Searching all possible shift points in the evolutionary sequence"
Total # hypotheses: 44
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
paleoTSfit object [n = 63 , K = 4 ]
Model: URW-URW
Method: Joint
log-likelihood = 79.27473
AICc = -149.8598
Parameter estimates:
anc vstep vstep shift1
3.7304865 0.2432606 0.2494830 52.0000000
Log-likelihoods of all tested shift-points
shift all.logl
11 70.86731
12 68.59747
13 62.65156
14 65.41455
15 53.23718
16 60.25449
17 56.19608
18 45.50594
19 42.55632
20 44.70092
21 46.05836
22 46.58797
23 48.14004
24 44.19647
25 67.63474
26 57.82010
27 64.69928
28 69.73926
29 58.45089
30 55.26295
31 53.40897
32 64.95625
33 66.97684
34 66.65108
35 72.98220
36 65.42928
37 71.12637
38 76.39216 *
39 74.96141 *
40 74.07654
41 74.06923
42 77.98640 *
43 75.27489 *
44 73.96364
45 78.18181 *
46 74.92223 *
47 75.10229 *
48 77.07127 *
49 77.28518 *
50 79.12256 *
51 75.99980 *
52 79.27473 **
53 76.45725 *
54 70.37393
Shift occurs immediately AFTER listed sample number
** = maximum-likelihood shift point
* = additional shift points in CI [ within 4.744 logL units; Chi-sq P = 0.95 , df = 4 ]
Additional elements not printed: convergence logLFunction
The function fit.mode.shift
can also be used to fit all
pairwise combinations of the four models by setting the
fit.all
argument as TRUE.
If a shift point is
not defined (using the shift.point
argument), all possible
shift points are investigated for all models.
> fit.mode.shift(ln.diameter, fit.all = TRUE, minb = 10)
[1] "Searching all possible shift points in the evolutionary sequence"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
Comparing 16 models [n = 63, method = Joint]
logL K AICc dAICc Akaike.wt
Stasis-Stasis 53.24711 4 -97.80456 64.3140156 0.000
Stasis-URW 72.84306 4 -136.99646 25.1221145 0.000
Stasis-GRW 72.84414 5 -134.63564 27.4829349 0.000
Stasis-OU 74.97027 6 -136.44053 25.6780478 0.000
URW-URW 79.27473 4 -149.85981 12.2587671 0.001
URW-GRW 79.54430 5 -148.03597 14.0826065 0.000
URW-OU 85.34081 6 -157.18163 4.9369506 0.026
GRW-GRW 84.03615 6 -154.57229 7.5462836 0.007
GRW-OU 88.87002 7 -161.70368 0.4148986 0.254
OU-OU 90.26555 8 -161.86443 0.2541469 0.275
OU-GRW 84.14765 7 -152.25894 9.8596410 0.002
OU-URW 83.90670 6 -154.31340 7.8051743 0.006
OU-Stasis 87.80929 6 -162.11858 0.0000000 0.312
GRW-URW 83.79338 5 -156.53412 5.5844543 0.019
GRW-Stasis 86.49922 6 -159.49844 2.6201411 0.084
URW-Stasis 83.36672 5 -155.68080 6.4377793 0.012
[[1]]
paleoTSfit object [n = 63 , K = 6 ]
Model: OU-Stasis
Method: Joint
log-likelihood = 87.80929
AICc = -162.1186
Parameter estimates:
anc vstep theta_OU alpha omega shift1
3.705695031 0.327387632 3.817864883 5.226471406 0.001870449 38.000000000
Additional elements not printed: convergence logLFunction GG
The function returns a list of the highest log-likelihood
found for each investigated model. A detailed output from the model with
the lowest AICc value among the 16 candidate models is also given. An
OU-Stasis model with a shift point at sample (population) 38 has the
best relative fit according to AICc. Note, however, that the
model-combination Trend-OU (GRW-OU) has an almost equal AICc score
relative to the best model. Also the combination of two OU models (each
with their own fixed optimum) shows a good relative fit to the data.
4.8 Simulating data
It is possible to simulate data for all implemented models in
evoTS
. Standard R-package documentation can be seen by
entering ?sim.OUBM
and ?sim.accel.decel
5.0 Multivariate models
Traits are rarely changing independently of each other due to shared
genetics or development. Evolution is accordingly a multivariate
phenomenon. The evoTS
package includes functions to fit the
following multivariate trait models:
Multivariate unbiased random walks
Multivariate decelerated evolution
Multivariate accelerated evolution
Multivariate Ornstein-Uhlenbeck processes
5.1 Multivariate unbiased random walks with and without changes in the rate of evolution
Evolution as a multivariate unbiased random walk is modeled using an evolutionary rate matrix R (Felsenstein 1988; Revell and Harmon 2008). The diagonal elements in R represent the rate of evolution for the individual traits, while the off-diagonal elements represent the extent to which different traits co-evolve.
The multivariate variance-covariance matrix for the unbiased random
walk model (V) is computed using the Kronecker product
of the R matrix and a “distance matrix”
C, describing the distance in time between the
different samples/populations in the time-series.
\[V = \sum_{i=1}^{m} R \otimes C\]
Sampling error of the trait mean (calculated as the sample variance
divided by the sample size) is – as for all models in evoTS
– added to the diagonal of the V matrix. To ensure
symmetric positive definiteness of the V matrix during
log-likelihood optimization, R is parameterized by its
Cholesky decomposition as the cross-product of upper triangular
matrices. In cases where different parts of the evolutionary time-series
are described by \(R_{m}\) matrices,
V is computed as the sum of the different \(R_{m}\) and \(C_{m}\) matrices:
\[V = \sum_{i=1}^{m} R_{m} \otimes C_{m}\]
The current implementation of the multivariate unbiased random
walk model allows testing six variants of the model. All variants of the
model can be fitted using different specifications of R
and
r
arguments in the fit.multivariate.URW
function.
There are two options for the structure of the R
matrix. Setting R = "diagonal"
means only the diagonal
elements of the R matrix will be estimated while
off-diagonal elements are set to 0 (see panel a below). This
parameterization of the R matrix means the changes in
the traits are assumed to be uncorrelated. Setting
R = "symmetric"
means all (both diagonal and off-diagonal)
elements in the R matrix are estimated (panel b below).
This parameterization estimates how changes in the traits are
correlated.
The argument r
in the
fit.multivariate.URW
function defines whether the rate of
change is assumed constant (“fixed”), asymptotically decreasing
(“decel”), or asymptotically increasing (“accel”) with time. Defining
r
as “fixed” means a regular multivariate unbiased random
walk is fitted to the data. The “decel” and “accel” options fit
multivariate versions of the decelerated and accelerated versions of the
unbiased random walk, respectively. These latter two models deviate from
the multivariate unbiased random walk in that the distance matrix
C is transformed by an exponential decay or
acceleration parameter, r, that is jointly estimated during the
maximum likelihood search for the R matrix. A common
r parameter is assumed for all the traits.
To use the multivariate models, we first need multivariate data (i.e., at least two traits or variables).
The data set on phenotypic evolution in Stephanodiscus
yellowstonensis (Theriot et al. 2006) contains data on the number
of ribs in addition to the size of the diameter we have analyzed so far.
Before combining the rib and diameter data into a multivariate data set,
we first do an approximate log-transformation of the rib data and
convert the time vector to unit length.
## Doing an approximate log-transformation of the data
> ln.ribs<-paleoTS::ln.paleoTS(ribs_S.yellowstonensis)
## Convert the time vector to unit length
> ln.ribs$tt<-ln.ribs$tt/(max(ln.ribs$tt))
We combine the two paleoTS
objects into a
multivariate evoTS
object using the
make.multivar.evoTS
function to make a multivariate data
set ready to be analyzed in evoTS
.
We can use the plotevoTS.multivariate
function to
have a look at the combined data set.
> plotevoTS.multivariate(diam.ln_ribs.ln, y_min=3.4, y_max=4.8, x.label = "Relative time", pch=c(20,20))
Eye-balling the data seems to suggest that the traits
change in a coordinated fashion.
We first fit a multivariate unbiased random walk model where the
off-diagonal elements in the R matrix are zero, and the
rate of evolution is assumes constant. This is equivalent as fitting two
separate univariate unbiased random walks models to each of the two
time-series.
> fit.multivariate.URW(diam.ln_ribs.ln, R = "diag", r = "fixed")
[1] "Model converged successfully."
$modelName
[1] "Multivariate Random walk (R matrix with zero off-diagonal elements)"
$logL
[1] 136.1905
$AICc
[1] -263.6914
$ancestral.values
[1] 3.71049 4.00711
$SE.anc
[1] NA
$R
[,1] [,2]
[1,] 0.2387433 0.0000000
[2,] 0.0000000 0.4568377
$SE.R
[1] NA
$method
[1] "Joint"
$K
[1] 4
$n
[1] 63
$iter
[1] NA
attr(,"class")
[1] "paleoTSfit"
The returned parameters include the ancestral trait values for
the two traits and the evolutionary rate matrix R. The
diagonal in the R matrix contains the step size (rate
of evolution) parameters. The second trait (ribs) has about twice the
rate of evolution as the first parameter (diameter). The off-diagonal
elements are zero as this model is not estimating the covariance of the
evolutionary changes in the two traits.
Next, we fit a model that allows the off-diagonal elements in the R
matrix to be different from zero. We do this by setting
R = symmetric
. We are still keeping the rate of change
fixed through time.
> fit.multivariate.URW(diam.ln_ribs.ln, R = "symmetric", r = "fixed")
$converge
[1] "Model converged successfully"
$modelName
[1] "Multivariate Random walk (R matrix with non-zero off-diagonal elements)"
$logL
[1] 182.3777
$AICc
[1] -353.7027
$ancestral.values
[1] 3.717695 4.025925
$SE.anc
[1] NA
$R
[,1] [,2]
[1,] 0.2680092 0.3780642
[2,] 0.3780642 0.5524616
$SE.R
[1] NA
$method
[1] "Joint"
$K
[1] 5
$n
[1] 63
$iter
[1] NA
attr(,"class")
[1] "paleoTSfit"
A multivariate random walk with correlated changes has a much
better fit compared to the model assuming uncorrelated changes in the
traits according to AICc. This indicates that the traits are not
evolving independently of each other. The estimated R
matrix indicates that the first trait has about half the rate of
evolution as the second trait and that there is substantial covariance
in the evolutionary changes of the two traits. How the two traits
correlate in their changes can be computed by standardizing the
covariance with the product of the standard deviations on the diagonal
(this can also be done using the function cov2cor
in the
stats
package).
> 0.3780642/(sqrt(0.2680092)*sqrt(0.5524616))
[1] 0.9825161
# Or alternatively:
> model1<-fit.multivariate.URW(diam.ln_ribs.ln, R = "symmetric", r = "fixed")
[1] "Model converged successfully."
> stats::cov2cor(model1$R)
[,1] [,2]
[1,] 1.000000 0.982516
[2,] 0.982516 1.000000
A correlation of 0.98 basically means the two traits evolve as
a single trait, since at least part of the deviation from a correlation
of 1 is due to measurement error.
Note that the R matrix is not describing the underlying genetic or phenotypic (co)variances of the traits. The R matrix is therefore not the same as a P (or G) matrix in quantitative genetics. However, the R matrix is tightly connected to these matrices. For example, if the traits evolve only due to drift, the R matrix is expected to be proportional to the additive genetic variance–covariance matrix (G) (Lande 1979; Felsenstein 1988). Estimating R can therefore aid in evolutionary interpretations of the fossil record anchored in evolutionary quantitative genetics.
Standard errors of the elements in the R matrix can be approximated
by the square root of the diagonal elements of the inverse of the
negative of the Hessian matrix. These standard errors are automatically
estimated and reported if the argument hess=TRUE
in
defined.
Parameterizing multivariate models is demanding in terms of
computational time (Felsenstein 1973; Hadfield and Nakagawa 2010;
Freckleton 2012). Be aware that fitting the multivariate models to
several traits with many samples (populations) will take much longer
time than fitting univariate models to each trait separately. One way to
somewhat follow the progress of the model fit is to set
trace = TRUE
. This allows the user to follow the progress
of the optimization routine to minimize the likelihood function.
The likelihood surface of multivariate models can contain several
local peaks. It is therefore recommended to rerun the model from
different starting points (i.e., different initial parameter values).
The number of iterations can be defined by the iterations argument
(e.g., iterations = 10
). Initial values for the search
algorithm are drawn from a normal distribution with a default standard
deviation of 1. The user can set this standard deviation by the
iter.sd
argument. (e.g., iter.sd = 0.5
).
We can check if we find evidence for a change in the
R matrix along the time-series. This involves
estimating a separate R matrix for two non-overlapping
parts of the time-series. This is done using the function
fit.multivariate.URW.shift
. We can define the shift point
using the argument shift.point
(e.g.,
shift.point = 20
) or we can investigate all possible shift
points in the time series by not defining a shift point (the default
option). The length of the smallest number of samples (populations) in
each of the two segments is controlled by the minb
argument
(the default is 10).
> fit.multivariate.URW.shift(diam.ln_ribs.ln, hess = TRUE)
Total # hypotheses: 44
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
$converge
[1] "Model converged successfully"
$modelName
[1] "Multivariate Random walk with two R matrices (with non-zero off-diagonal elements)"
$logL
[1] 190.7035
$AICc
[1] -360.0108
$ancestral.values
[1] 3.730242 4.061121
$SE.anc
[1] 0.02942139 0.04554309
$R
$R$R.1
[,1] [,2]
[1,] 0.3712051 0.6509228
[2,] 0.6509228 1.1705596
$R$R.2
[,1] [,2]
[1,] 0.2216526 0.2628519
[2,] 0.2628519 0.3164376
$SE.R
$SE.R$SE.R.1
[,1] [,2]
[1,] 0.01712463 0.02980279
[2,] 0.02980279 0.06689237
$SE.R$SE.R.2
[,1] [,2]
[1,] 0.003596406 0.004424130
[2,] 0.004424130 0.006407946
$method
[1] "Joint"
$K
[1] 9
$n
[1] 63
$iter
[1] NA
$parameters
shift1
18
$all.logl
[1] 186.7757 185.1417 186.0625 189.5071 189.7518 187.6900 187.1944 190.7035 187.8904 187.7113 187.0241 186.5424 185.8391 185.3580 187.9486 185.8228 186.7893 186.8395
[19] 187.8938 186.4932 185.8709 186.7557 186.0485 184.9683 184.9777 184.3315 183.9433 183.9788 182.8478 182.2962 181.5945 181.9293 181.2754 180.8469 181.1119 180.5572
[37] 180.0302 179.8013 179.7917 180.0224 179.8181 179.8010 179.5718 179.7813
$GG
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28]
[1,] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
[,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44]
[1,] 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
attr(,"class")
[1] "paleoTSfit"
Two R matrices are returned (R.1 and
R.2), along with their standard errors (since we used
hess = TRUE
). The estimated shift point is 18
(shift1). The rate of evolution is much larger in the second
trait compared to the first trait (the diagonal elements in
R.1) while the difference in rate of evolution is much smaller
after the shift point (the diagonal elements in R.2). The
evolutionary correlation is 0.98 and 0.99 before and after the shift
point. The number of parameters (K) for this model is 9
compared to 5 for the model where we estimated a single
R matrix for the multivariate data set. The model with
two R matrices has a lower (better) AICc score compared
to the model with a single R matrix. This difference in
AICc scores is likely to be a result of the differences in rates of
evolution in R.1 and R.2 since the evolutionary
correlation is estimated to be very similar in both R.1 and
R.2.
We now check if the multivariate accelerated and decelerated models
have a better fit than the multivariate random walk models.
> multi.accel<-fit.multivariate.URW(diam.ln_ribs.ln, R = "symmetric", r = "accel")
> multi.decel<-fit.multivariate.URW(diam.ln_ribs.ln, R = "symmetric", r = "decel")
> multi.accel$AICc;multi.decel$AICc
[1] -351.2512
[1] -356.3287
The multivariate decelerated evolution model has a similar
(but worse) fit to the data compared to the multivariate unbiased random
walk model with a single R matrix, but is out-competed
by the model estimating two R matrices.
5.2 Multivariate Ornstein-Uhlenbeck models
Multivariate Ornstein-Uhlenbeck models allow for testing a range of
different adaptive hypotheses of trait evolution, as detailed below.
Adapted to describe evolution of traits within the same lineage, the
multivariate Ornstein-Uhlenbeck process is described by the following
differential equation (Bartoszek et al. 2012; Reitan et
al. 2012; Clavel et al. 2015):
\[dY = A(\theta(t)-Y(t))dt + RdW(t)\]
where A is a square matrix that describes the rate of
evolution toward the optimal trait values and with dimensions equal to
the number of investigated traits (A is often called
the pull matrix), \(\theta\) is a
vector containing the optimum for each trait, R is a
square matrix describing the stochastic changes in the traits and with
dimensions equal to the number of investigated traits
(R is often called the drift matrix), and \(W\) is the diffusion parameter. Under the
assumption that the we only have one selective regime (optimum) per
trait, the expected trait mean of the Ornstein-Uhlenbeck process is the
weighted sum of the optimum and the root value (Hansen 1997):
\[E\left[Z_{i}\right]=
e^{(-At_{i})}z_{0} + (1-e^{(-At_{i})})\theta\]
where \(Z_{i}\) is a vector containing the expected
trait values for sample i, \(z_{0}\) is a vector containing the
ancestral trait means, \(\theta\) is a
vector containing the optima, and \(t_{i}\) is the time interval from the
ancestral population mean (the start of the time-series, which has a
time of 0) to the ith population mean.
The variance and covariance of sample/population means are given by
the following expression:
\[Cov(z_{i}, z_{j})=\left[P\left( \left[
\frac{1}{\lambda_{k} + \lambda_{l}} \left( 1-e^{-({\lambda_{k} +
\lambda_{l})}t_{a}}\right)\right]_{1\le kl \le m} \odot
P^{-1}\Sigma\Sigma^{T}(P^{-1})^{T}\right)
P^{T}\right]e^{-A^{T}t_{ij}}\]
where P is
the orthogonal matrix of eigenvectors of A, \(\Sigma\Sigma^{T}\) is the Cholesky
decomposition of the R matrix, \(\lambda_{i}\) is the ith
eigenvalue of A, \(t_{a}\) is the time interval from the
ancestral population to the oldest of the two populations \(z_{i}\) and \(z_{j}\), and \(t_{ij}\) is the time separating two samples
\(z_{i}\) and \(z_{j}\), while \(\odot\) represents the Hadamard
(element-wise) product.
Different ways to parameterize the A matrix allows
testing a range of evolutionary hypotheses using the multivariate OU
process. While the R matrix in the multivariate version
of the unbiased random walk (and in the multivariate OU model) either
needs to be diagonal (only elements on the diagonal while off-diagonal
elements are zero) or complete (in the sense that both the diagonal and
off-diagonal elements are non-zero), this is not the case for the
A matrix. Generally, we can divide up the hypotheses
being tested using the multivariate OU into four different types:
Independent evolution (only diagonal elements in both the A and the R matrix, i.e., equivalent to fitting univariate models to each trait separately.)
Independent adaptation (only diagonal elements in the A matrix while the R matrix is completely parameterized. In such a model, the traits are adapting independently toward their optima, but the stochastic changes in the traits are correlated.)
At least one trait affects the optimum of the other trait. The diagonal and at least one of the off-diagonal elements in the A matrix is non-zero. It is the trait in the column of the non-zero off-diagonal element that affects the optimum of the trait in the row of the non-zero off-diagonal element. A negative off-diagonal element means the trait evolves toward the optimum determined by the other trait, while a positive off-diagonal element means the trait evolve away from the optimum determined by the other trait. The stochastic changes in the trait can be either correlated or non-correlated.
The same traits affect each others optimum (but to different degrees). The same off-diagonal elements on both sides of the diagonal are parameterized in the A matrix. The stochastic changes in the trait can be either correlated or non-correlated.
For a data set consisting of two traits, there are four possible
parameterizations of the A matrix (see panels below).
Independent adaptation of both traits (panel a), trait Y affecting the
optimum of trait X (panel b), trait X affecting the optimum of trait Y,
and the traits X and Y affecting each others optima (panel d). Each of
these four ways of parameterizing the A matrix can be
combined with an R matrix with elements only on the
diagonal (the stochastic changes in the traits are uncorrelated) or a
completely parameterized R matrix (the stochastic
changes in the traits are correlated).
An A matrix with off-diagonal elements investigates (Granger) causality between the two traits/variables (Granger 1969; Schweder 1970). Simply speaking, we have evidence for Granger causality if observations in one time series is useful for forecasting observations in one or several other time series, which would be the case if one trait affects the optimum that another trait tracks with a lag. Multivariate OU models therefore enable us to move beyond interpreting correlations among variables. While correlation is a measure of linear dependence between two random variables, the time dimension (i.e., the lag/evolutionary inertia in the tracking of the changing optimum) of the potential relationship between two variables is important in Granger causality. And while a correlation is symmetric (the correlation between X and Y is the same as the correlation between Y and X), this is not necessarily the case for Granger causality. X can (Granger) cause Y, without Y Granger-causing X.
It is also possible to implement a model where the last trait in the
data set evolves as an unbiased random walk which affects the optima for
all other traits in the data set. This model can be fitted by defining
A.matrix ="OUBM"
, which sets the last diagonal element in
the A matrix to zero. A zero diagonal element in the
A matrix means there is no tendency for this trait to
evolve towards an optimal value (which is the case for an unbiased
random walk). Setting A.matrix ="OUBM"
will automatically
define the R matrix to only have diagonal elements
(i.e., this option for the A matrix will override how
the R matrix is defined by the user). The reason for
this is that the stochastic changes in the variable evolving as an
unbiased random walk will affect the optimum of the other traits, which
means the stochastic trait dynamics of the traits evolving according to
an OU model should be independent of the changes in the optimum.
Multivariate Ornstein-Uhlenbeck models can be fitted using two
different functions in evoTS
. The function
fit.multivariate.OU
allows the user to use pre-defined
arguments for the A and R matrices to
parameterize these matrices. The A.matrix
can either be
defined as “diag” (panel a above), “upper.tri” (panel b above),
“lower.tri” (panel c above), and “full” (panel d above). Default is
“diag”. The R.matrix
can be defined as “diag” or
“symmetric”. The function fit.multivariate.OU
allows the
user to test all (sensible) hypotheses for multivariate data that
consists of two traits.
If the data consists of more than two traits, it is recommended to
use the function fit.multivariate.OU.user.defined
. This
function lets the user define which elements in the A
and R matrices that will be parameterized. This
function therefore allows full flexibility for the type of hypotheses
that are being tested as the parameterization is not constrained to
follow the pre-defined options for the A matrix
available in the function fit.multivariate.OU
.
We will investigate whether the two traits from the S.
yellowstonensis lineage show evidence of independent evolution
(only diagonal elements in the A and R
matrices) or whether only the adaptation part of the trait dynamics is
independent in the two traits (diagonal A matrix and
symmetric R matrix). We will test these hypotheses
using the fit.multivariate.OU
function.
Note that the multivariate OU model demands much more computational
time compared to univariate models and simple multivariate models (like
the multivariate unbiased random walk). The computational time grows
exponentially with the dimension of the variance–covariance matrix
(e.g., Felsenstein 1973; Hadfield and Nakagawa 2010; Freckleton 2012).
The time it takes to fit the multivariate OU models therefore both
depends on the number of traits and the length of the time series.
Setting trace = TRUE
allows the user to keep an eye on how
the optimization proceeds.
> OUOU.model1<-fit.multivariate.OU(diam.ln_ribs.ln, A.matrix="diag", R.matrix="diag")
> OUOU.model2<-fit.multivariate.OU(diam.ln_ribs.ln, A.matrix="diag", R.matrix="symmetric")
> OUOU.model1$AICc;OUOU.model2$AICc
[1] -267.6676
[1] -352.0637
> OUOU.model2
$converge
[1] "Model converged successfully"
$logL
[1] 186.7299
$AICc
[1] -352.0637
$ancestral.values
[1] 3.718630 4.006437
$SE.anc
[1] NA
$optima
[1] 3.876449 4.282549
$SE.optima
[1] NA
$A
[,1] [,2]
[1,] 12.70187 0.00000
[2,] 0.00000 14.60702
$SE.A
[1] NA
$half.life
[1] 0.05457047 0.04745301
$R
[,1] [,2]
[1,] 0.02233866 0.07823089
[2,] 0.07823089 0.85571347
$SE.R
[1] NA
$method
[1] "Joint"
$K
[1] 9
$n
[1] 63
$iter
[1] NA
attr(,"class")
[1] "paleoTSfit"
>stats::cov2cor(OUOU.model2$R)
[,1] [,2]
[1,] 1.0000000 0.5576264
[2,] 0.5576264 1.0000000
The model with independent adaptation and correlated
stochastic changes is much better than the independent evolution model.
The half-life for the log diameter and log number of ribs are 6.1% and
4.9% of the length of the time-series, which translates into (13728 *
0.061 =) 837 and (13728 * 0.049 =) 673 years respectively. The
correlation of the stochastic changes is substantial, but much reduced
compared to the estimate of the correlation from the multivariate
unbiased random walk. This is because a substantial part of the trait
dynamics in a multivariate OU model is due to the deterministic approach
of the traits toward the optima. The model has an almost identical
relative fit compared to the multivariate unbiased random walk according
to AICc, but is out-competed by the unbiased random walk with a mode
shift.
We now test if a more complex parameterization of the
A matrix gives us a better relative model fit according
to AICc.
> OUOU.model3<-fit.multivariate.OU(diam.ln_ribs.ln, A.matrix="upper.tri", R.matrix="symmetric")
> OUOU.model4<-fit.multivariate.OU(diam.ln_ribs.ln, A.matrix="full", R.matrix="symmetric")
> OUOU.model5<-fit.multivariate.OU(diam.ln_ribs.ln, A.matrix="OUBM")
> OUOU.model3$AICc;OUOU.model4$AICc;OUOU.model5$AICc
[1] -350.0442
[1] -380.0126
[1] -305.4217
The best model is a model where each trait affects the optimum
of the other trait.
However, before we trust this result, we should make sure to run the
multivariate models from different initial starting values to increase
our chances of finding a potentially higher peak on the log-likelihood
surface. The number of iterations can be defined by the
iterations
argument. The starting values are drawn from a
normal distribution with a standard deviation of one. The user can
define a larger or smaller standard deviation using the argument
iter.sd
.
All models possible to investigate using the
fit.multivariate.OU
function can also be investigated using
the fit.multivariate.OU.user.defined
function as the latter
lets the user define which elements in the A and
R matrices that are parameterized. Which elements in
A and R that should be parameterized
or set to zero are given by 1 and 0 respectively. We can for example fit
a model with a lower triangle A matrix and a symmetric
R matrix. A lower triangle A matrix
means the diameter is affecting the optimum of the number of ribs.
> A <- matrix(c(1,0,1,1), nrow=2, byrow=TRUE)
> R <- matrix(c(1,1,1,1), nrow=2, byrow=TRUE)
> OUOU.model6<-fit.multivariate.OU.user.defined(diam.ln_ribs.ln, A.user=A, R.user=R)
> OUOU.model6$AICc
[1] -350.146
The fit.multivariate.OU.user.defined
function is
especially handy if we have a data set consisting of more than two
traits. Let’s for example assume that we have a multivariate data set
consisting of two phenotypic traits and a climatic variable, which we
refer to as variables M, N and O, respectively. Let’s assume we have an
hypothesis that the climatic variable (O) is changing as a random walk
and is affecting the optimum of trait N. We also hypothesize that trait
N affects the optimum of trait M, but that trait M is not affecting the
optimum of trait N. We assume the stochastic changes in traits M, N are
correlated while changes in O are uncorrelated with M and N. This
hypothesis is parameterized by defining the A and
R matrices the following way.
> A <- matrix(c(1,1,0,0,1,1,0,0,0), nrow=3, byrow=TRUE)
> A
[,1] [,2] [,3]
[1,] 1 1 0
[2,] 0 1 1
[3,] 0 0 0
> R <- matrix(c(1,1,0,1,1,0,0,0,1), nrow=3, byrow=TRUE)
> R
[,1] [,2] [,3]
[1,] 1 1 0
[2,] 1 1 0
[3,] 0 0 1
Note that it is possible to define nonsensical biological hypotheses
given the full flexibility on how to parameterize A and
R using the
fit.multivariate.OU.user.defined
function. For example, an
A matrix with only zeros on the diagonal and non-zero
off-diagonal elements would parameterize a model where the variables
changes according to a random walk (since the diagonal elements are
zero), but where all variables are assumed to affect the optima for all
variables in the data set (all off-diagonal elements are non-zero). This
is a non-sense model as a variable changing according to a random walk
has no tendency (or ability) to evolve towards an optimum.
5.2.1 Initial parameter values
All model fitting procedures to find maximum likelihood solutions
need initial starting values for the model parameters. Most of the
starting values when fitting multivariate OU models are based on maximum
likelihood parameter estimates of the univariate versions of the OU
model fitted to each trait separately. The starting values for the
off-diagonal elements in the A and R
matrices are set to 0 and 0.5, respectively. This way to set the initial
parameter values seems to work fine for most data sets, but there is no
guarantee that the provided initial parameters will always work as this
depends on the nature of the data to be analyzed. If an error message is
returned saying “function cannot be evaluated at initial parameters”, it
is recommended that the user sets the iteration option to 1
(iteration = 1
). The model fitting algorithm will then
produce a set of (random) initial starting values and retry to start the
optimization procedure 100,000 times.
Another option is to define which initial parameter values the
optimization procedure should start from. These values can be provided
by setting the user.init...
(e.g.,
user.init.off.diag.R = 1
) to something else than NULL
(which is the default setting). Note that the length of the vector of
user-specified initial values need to match the number of parameters to
be estimated (e.g., user.init.off.diag.R = 1
for a
multivariate data set consisting of two traits,
user.init.off.diag.R = c(1,1,1)
if the data set contains
three traits, etc.).
6.0 References
Akaike. H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19:716 - 723
Bartoszek, K., J. Pienaar, P. Mostad, S. Andersson, and T. F. Hansen. 2012 A phylogenetic comparative method for studying multivariate adaptation. Journal of Theoretical Biology 314:204–215.
Burnham, K. P. and D. R. Anderson. 2002. Model Selection and multimodel inference: a practical information-theoretic approach (2nd ed.)- Springer-Verlag.
Clavel, J., G. Escarguel, and G. Merceron. 2015. mvmorph: an r package for fitting multivariate evolutionary models to morphometric data. Methods in Ecology and Evolution 6:1311–1319.
Cooper, N., and A. Purvis, 2010. Body size evolution in mammals: complexity in tempo and mode. The American Naturalist 175:727–738.
Edwards, A. W. F. 1992. Likelihood. expanded edition Johns Hopkins University Press. Baltimore, MD.
Felsenstein, J. 1973. Maximum-likelihood estimation of evolutionary trees from continuous characters. The American Journal of Human Genetics 25:471–492.
Felsenstein, J. 1988. Phylogenies and quantitative characters. Annual Review of Ecology, Evolution, and Systematics 19:445–471.
Freckleton R. P. 2012. Fast likelihood calculations for comparative analyses. Methods in Ecology and Evolution 3:940-947.
Granger, C. W. J. 1969. Investigating causal relations by econometric models and cross‐spectral methods. Econometrica 37:424-438.
Hadfield, J. D., and S. Nakagawa. 2010. General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters. Journal of Evolutionary Biology 23:494-508.
Hansen, T. F. 1997. Stabilizing selection and the comparative analysis of adaptation. Evolution 51:1341–1351.
Hansen, T. F., J. Pienaar, and S. H. Orzack. 2008. A comparative method for studying adaptation to a randomly evolving environment. Evolution 62:1965–1977.
Harmon, L. J. et al. 2010. Early bursts of body size and shape evolution are rare in comparative data. Evolution 64:2385–2396.
Hunt, G. 2006. Fitting and comparing models of phyletic evolution: random walks and beyond. Paleobiology 32:578–601.
Hunt, G. 2008a. Gradual or pulsed evolution: when should punctuational explanations be preferred? Paleobiology 34:360–377.
Hunt, G. 2008b. Evolutionary patterns within fossil lineages: model-based assessment of modes, rates, punctuations and process. In R.K. Bambach and P.H. Kelley, eds. From Evolution to Geobiology: Research Questions Driving Paleontology at the Start of a New Century:578–601.
Hunt, G., M. Bell & M. Travis. 2008. Evolution towards a new adaptive optimum: phenotypic evolution in a fossil stickleback lineage. Evolution 62:700-710.
Hunt, G., S. Wicaksono, J. E. Brown, and G. K. Macleod. 2010. Climate-driven body size trends in the ostracod fauna of the deep Indian Ocean. Palaeontology 53:1255-1268.
Hunt, G., M. J. Hopkins, and S. Lidgard. 2015 Simple versus complex models of trait evolution and stasis as a response to environmental change. PNAS 112:4885–4890.
Lande, R. 1979. Quantitative genetic analysis of multivariate evolution, applied to brain: body size allometry. Evolution 33:402-416.
Reitan, T., T. Schweder, and J. Henderiks. 2012. Phenotypic evolution studied by layered stochastic differential equations. The Annals of Applied Statistics. 6:1531–1551.
Revell, L. J., and L. Harmon 2008. Testing quantitative genetic hypotheses about the evolutionary rate matrix for continuous characters. Evolutionary Ecology Research 10:311–331.
Schweder, T. 1970. Composable Markov processes. Journal of Applied Probability 7:400–410.
Theriot, E. C., S. C., Fritz, C. Whitlock, C. and D. J. Conley. 2006. Late Quaternary rapid morphological evolution of an endemic diatom in Yellowstone Lake, Wyoming. Paleobiology 32:38–54.
Voje, K. L. 2020. Testing eco‐evolutionary predictions using fossil data: Phyletic evolution following ecological opportunity. Evolution 74:188–200.