Chapter 13 Bayesian Meta

您所在的位置：网站首页 › networkmeta › Chapter 13 Bayesian Meta

Chapter 13 Bayesian Meta

2024-02-13 03:43| 来源: 网络整理| 查看: 265

13 Bayesian Meta-Analysis

I n the last chapters, we have delved into somewhat more sophisticated extensions of meta-analysis, such as “multilevel” models (Chapter 10), meta-analytic structural equation modeling (Chapter 11), and network meta-analysis (Chapter 12). Now, we will take one step back and revisit “conventional” meta-analysis again–but this time from another angle. In this chapter, we deal with Bayesian meta-analysis.

We already covered a Bayesian model in the last chapter on network meta-analysis. There, we discussed the main ideas behind Bayesian statistics, including Bayes’ theorem and the idea of prior distributions (see Chapter 12.3.1). In the present chapter, we build on this knowledge and try to get a more thorough understanding of the “Bayesian way” to do meta-analysis. When we set up our Bayesian network meta-analysis model, for example, the {gemtc} package specified the priors automatically for us. Here, we will do this ourselves.

While its background is slightly more involved, we will see that Bayesian meta-analysis essentially aims to do the same thing as any “conventional” meta-analysis: it pools observed effect sizes into one overall (true) effect. Using a Bayesian model, however, also comes with several practical advantages compared to frequentist approaches. This makes it worthwhile to learn how we can implement such models using R.

13.1 The Bayesian Hierarchical Model

To perform a Bayesian meta-analysis, we employ a so-called Bayesian hierarchical model (Röver 2017; Julian Higgins, Thompson, and Spiegelhalter 2009). We already briefly covered this type of model in the network meta-analysis chapter (Chapter 12.3.2).

In Chapter 10, we learned that every meta-analytic model comes with an inherent “multilevel”, and thus hierarchical, structure. On the first level, we have the individual participants. Data on this level usually reaches us in the form of calculated effect sizes $\hat\theta_k$ of each study $k$. We assume that participants are nested within studies on the second level and that the true effect sizes $\theta_k$ of different studies in our meta-analysis follow their own distribution. This distribution of true effects has a mean $\mu$ (the pooled “true” effect we want to estimate) and variance $\tau^2$, representing the between-study heterogeneity.

Let us try to formalize this. On the first level, we assume that the observed effect size $\hat\theta_k$ reported in study $k$ is an estimate of the “true” effect $\theta_k$ in this study. The observed effect $\hat\theta_k$ deviates from $\theta_k$ due to the sampling error $\epsilon_k$. This is because we assume that $\hat\theta_k$ was drawn (sampled) from the population underlying $k$. This population can be seen as a distribution with mean $\theta_k$, the “true” effect of the study, and a variance $\sigma^2$.

In the second step, we assume that the true effect sizes $\theta_k$ themselves are only samples of an overarching distribution of true effect sizes. The mean of this distribution $\mu$ is the pooled effect size we want to estimate. The study-specific true effects $\theta_k$ deviate from $\mu$ because the overarching distribution also has a variance $\tau^2$, signifying the between-study heterogeneity. Taken together, this gives these two equations:

\[\begin{align} \hat\theta_k &\sim \mathcal{N}(\theta_k,\sigma_k^2) \notag \\ \theta_k &\sim \mathcal{N}(\mu,\tau^2) \tag{13.1} \end{align}\]

Here, we use $\mathcal{N}$ to indicate that parameters to the left were sampled from a normal distribution. Some may argue that this is an unnecessarily strict assumption for the second equation (Julian Higgins, Thompson, and Spiegelhalter 2009), but the formulation as shown here is the one that is used most of the time. As covered before, the fixed-effect model is simply a special case of this model in which we assume that $\tau^2 = 0$, meaning that there is no between-study heterogeneity, and that all studies share one single true effect size (i.e. that for all studies $k$: $\theta_k = \mu$).

We can also simplify this formula by using the marginal form:

\[\begin{equation} \hat\theta_k \sim \mathcal{N}(\mu,\sigma_k^2 + \tau^2) \tag{13.2} \end{equation}\]

You may have already detected that these formulas look a lot like the ones we defined when discussing the random-effects (Chapter 4.1.2) and three-level meta-analysis (Chapter 10.1) model. Indeed, there is nothing particularly “Bayesian” about this formulation. This changes, however, when we add the following equations (Williams, Rast, and Bürkner 2018):

\[\begin{align} (\mu, \tau^2) &\sim p(.) \notag \\ \tau^2 &> 0 \tag{13.3} \end{align}\]

The first line is particularly important. It defines prior distributions for the parameters $\mu$ and $\tau^2$. This allows us to specify a priori how we think the true pooled effect size $\mu$ and between-study heterogeneity $\tau^2$ may look like, and how certain we are about this. The second equation adds the constraint that the between-study heterogeneity variance must be larger than zero. However, this formula does not specify the exact type of prior distribution used for $\mu$ and $\tau^2$. It only tells us that some prior distribution is assumed. We will cover reasonable, specific priors for Bayesian meta-analysis models in more detail later.

In the chapter on network meta-analysis, we already covered the method through which Bayesian approaches estimate model parameters. To recap, this involves using Markov Chain Monte Carlo-based sampling procedures, for example Gibbs sampling. In the {brms} package, which we will be using in this chapter, so-called No-U-Turn sampling (NUTS, Hoffman and Gelman 2014) is used68.

In the previous chapters, we primarily used the {meta} and {metafor} packages. These packages allow to conduct meta-analyses based on a non-Bayesian, or frequentist framework. Therefore, you might be wondering why one should start using Bayesian methods, given that we can already resort to such powerful tools using “conventional” approaches. The reason is that Bayesian meta-analysis comes with a few distinct advantages (Williams, Rast, and Bürkner 2018; McNeish 2016; Chung et al. 2013):

Bayesian methods allow to directly model the uncertainty in our estimate of $\tau^2$. They can also be superior in estimating pooled effects, particularly when the number of included studies is small (which is very often the case in practice).

Bayesian methods produce full posterior distributions for both $\mu$ and $\tau^2$. This allows to calculate the exact probability that $\mu$ or $\tau^2$ is smaller or larger than some specified value. This is in contrast to frequentist methods, where we only calculate confidence intervals. However, (95%) confidence intervals only state that, if data sampling were repeated many, many times, the true value of a population parameter (such as $\mu$ or $\tau^2$) would fall into the range of the confidence interval in 95% of the samples. They do not tell us the probability that the true parameter lies between two specified values.

Bayesian methods allow us to integrate prior knowledge and assumptions when calculating meta-analyses.

13.2 Setting Prior Distributions

Before, we formalized the hierarchical model we can use to pool effects in a Bayesian meta-analysis. However, to run such a model, we have to specify the prior distributions of $\mu$ and $\tau^2$. Particularly when the number of studies is small, priors can have a considerable impact on the results, so we should choose them wisely.

Generally, a good approach is to use weakly informative priors (Williams, Rast, and Bürkner 2018). Weakly informative priors can be contrasted with non-informative priors. Non-informative priors are the simplest form of a prior distribution. They are usually based on uniform distributions, and are used to represent that all values are equally credible.

Weakly informative priors, on the other hand, are a little more sophisticated. They rely on distributions which represent that we have a weak belief that some values are more credible than others. However, they are still not making any specific statements concerning the value of the parameter to be estimated from our data.

Intuitively, this makes a lot of sense. In many meta-analyses, for example, it seems reasonable to assume that the true effect lies somewhere between SMD = -2.0 and 2.0, but will unlikely be SMD = 50. Based on this rationale, a good starting point for our $\mu$ prior may therefore be a normal distribution with mean 0 and variance 1. This means that we grant an approximate 95% prior probability that the true pooled effect size $\mu$ lies between −2.0 and 2.0:

\[\begin{equation} \mu \sim \mathcal{N}(0,1) \tag{13.4} \end{equation}\]

The next prior we have to specify is the one for $\tau^2$. This one is a little more difficult since we know that $\tau^2$ should always be non-negative, but can be (close to) zero. A recommended distribution for this case, and one which is often used for variances such as $\tau^2$, is the Half-Cauchy prior. The Half-Cauchy distribution is a special case of a Cauchy distribution, which is only defined for one “half” (i.e. the positive side) of the distribution69.

The Half-Cauchy distribution is controlled by two parameters. The first one is the location parameter $x_0$, which specifies the peak of the distribution. The second one is $s$, the scaling parameter. It controls how heavy-tailed the distribution is (i.e. how much it “spreads out” to higher values). The Half-Cauchy distribution is denoted with $\mathcal{HC}(x_0,s)$.

The graph below visualizes the Half-Cauchy distribution for varying values of $s$, with the value of $x_0$ fixed at 0.

The Half-Cauchy distribution typically has rather heavy tails, which makes it particularly useful as a prior distribution for $\tau$. The heavy tails ensure that we still give very high values of $\tau$ some probability, while at the same time assuming that lower values are more likely.

In many meta-analyses, $\tau$ (the square root of $\tau^2$) lies somewhere around 0.3, or at least in the same ballpark. To specify the Half-Cauchy prior, we may therefore use $s=$ 0.3. This ensures that a value of less than $\tau=$ 0.3 has a 50% probability (Williams, Rast, and Bürkner 2018). We can confirm this using the Half-Cauchy distribution function implemented in the phcauchy function of the {extraDistr} package (Wołodźko 2020).

library(extraDistr) phcauchy(0.3, sigma = 0.3) ## [1] 0.5

However, this is already a quite specific assumption concerning the true value of $\tau$. A more conservative approach, which we will follow in our hands-on example, is to set $s$ to 0.5; this makes the distribution flatter. In general, it is advised to always conduct sensitivity analyses with different prior specifications to check if they affect the results substantially. Using $s=$ 0.5 as our parameter of the Half-Cauchy distribution, we can write down our $\tau$ prior like this:

\[\begin{equation} \tau \sim \mathcal{HC}(0,0.5) \tag{13.5} \end{equation}\]

We can now put together the formulas of the hierarchical model, and our prior specifications. This leads to the complete model we can use for our Bayesian meta-analysis:

\[\begin{align} \hat\theta_k &\sim \mathcal{N}(\theta_k,\sigma_k^2) \notag \\ \theta_k &\sim \mathcal{N}(\mu,\tau^2) \notag \\ \mu &\sim \mathcal{N}(0,1) \notag \\ \tau &\sim \mathcal{HC}(0,0.5) \tag{13.5} \end{align}\]

13.3 Bayesian Meta-Analysis in R

Now that we have defined the Bayesian model for our meta-analysis, it is time to implement it in R. Here, we use the {brms} package (Bürkner 2017b, 2017a) to fit our model. The {brms} package is a very versatile and powerful tool to fit Bayesian regression models. It can be used for a wide range of applications, including multilevel (mixed-effects) models, generalized linear models, multivariate models, and generalized additive models, to name just a few. Most of these models require person-level data but {brms} can also be used for meta-analysis, where we deal with (weighted) study-level data70.

Before we start fitting the model, we first have to install and load the {brms} package.

library(brms)

13.3.1 Fitting the Model

In our hands-on example, we again use the ThirdWave data set, which contains information from a meta-analysis investigating the effects of “third wave” psychotherapies in college students (Chapter 4.2.1). Before we fit the model, let us first specify the prior distribution of the overall effect size $\mu$ and the between-study heterogeneity $\tau$. Previously, we defined that $\mu \sim \mathcal{N}(0,1)$ and $\tau \sim \mathcal{HC}(0,0.5)$.

We can use the prior function to specify the distributions. The function takes two arguments. In the first argument, we specify the distribution we want to assume for our prior, including the distribution parameters. In the second argument, we have to define the class of the prior. For $\mu$, the appropriate class is Intercept, since it is a fixed population-level effect. For $\tau$, the class is sd, because it is a variance (or, to be more precise, a standard deviation). We can define both priors using the prior function, then concatenate them, and save the resulting object as priors.

priors 71.

Unfortunately, there is currently no maintained package to directly create forest plots from {brms} models. But it is possible to build them oneself using functions of the {tidybayes} package (Kay 2020). So, let us first load this package along with a few other ones before we proceed.

library(tidybayes) library(dplyr) library(ggplot2) library(ggridges) library(glue) library(stringr) library(forcats)

Before we can generate the plot, we have to prepare the data. In particular, we need to extract the posterior distribution for each study individually (since forest plots also depict the specific effect size of each study). To achieve this, we can use the spread_draws function in the {tidybayes} package. The function needs three arguments as input: our fitted {brms} model, the random-effects factor by which the results should be indexed, and the parameter we want to extract (here b_Intercept, since we want to extract the fixed term: the effect size).

Using the pipe operator, we can directly manipulate the output. Using the mutate function in {dplyr}, we calculate the actual effect size of each study by adding the pooled effect size b_Intercept to the estimated deviation of each study. We save the result as study.draws.

study.draws % mutate(b_Intercept = r_Author + b_Intercept)

Next, we want to generate the distribution of the pooled effect in a similar way (since in forest plots, the summary effect is usually displayed in the last row). We therefore slightly adapt the code from before, dropping the second argument to only get the pooled effect. The call to mutate only adds an extra column called "Author". We save the result as pooled.effect.draws.

pooled.effect.draws % mutate(Author = "Pooled Effect")

Next, we bind study.draws and pooled.effect.draws together in one data frame. We then start a pipe again, calling ungroup first, and then use mutate to (1) clean the study labels (i.e. replace dots with spaces), and (2) reorder the study factor levels by effect size (high to low). The result is the data we need for plotting, which we save as forest.data.

forest.data % ungroup() %>% mutate(Author = str_replace_all(Author, "[.]", " ")) %>% mutate(Author = reorder(Author, b_Intercept))

Lastly, the forest plot should also display the effect size (SMD and credible interval) of each study. To do this, we use our newly generated forest.data data set, group it by Author, and then use the mean_qi function to calculate these values. We save the output as forest.data.summary.

forest.data.summary % mean_qi(b_Intercept)

We are now ready to generate the forest plot using the {ggplot2} package. The code to generate the plot looks like this:

ggplot(aes(b_Intercept, relevel(Author, "Pooled Effect", after = Inf)), data = forest.data) + # Add vertical lines for pooled effect and CI geom_vline(xintercept = fixef(m.brm)[1, 1], color = "grey", size = 1) + geom_vline(xintercept = fixef(m.brm)[1, 3:4], color = "grey", linetype = 2) + geom_vline(xintercept = 0, color = "black", size = 1) + # Add densities geom_density_ridges(fill = "blue", rel_min_height = 0.01, col = NA, scale = 1, alpha = 0.8) + geom_pointintervalh(data = forest.data.summary, size = 1) + # Add text and labels geom_text(data = mutate_if(forest.data.summary, is.numeric, round, 2), aes(label = glue("{b_Intercept} [{.lower}, {.upper}]"), x = Inf), hjust = "inward") + labs(x = "Standardized Mean Difference", # summary measure y = element_blank()) + theme_minimal()

Observed Versus Model-Based Effect Sizes

One thing is very important to mention here. The effect sizes displayed in the forest plot do not represent the observed effect sizes of the original studies, but the estimate of the “true” effect size ($\theta_k$) of a study based on the Bayesian model. The dots shown in the forest plot are equivalent to the study-wise estimates we saw when extracting the random effects using ranef (except that these values were centered around the pooled effect).

Furthermore, looking at the values of studies with very high effect sizes (e.g. outliers such as “DanitzOrsillo” and “Shapiro et al.”), we see that the model-based effect sizes are closer to the overall effect $\hat\mu$ than their initially observed values72.

This shrinkage to the mean is typical for hierarchical models with a common overarching distribution, such as meta-analytic random-effects models. During the estimation process, the Bayesian model “supplements” information on the effect of one study $k$ with information on the overall distribution of true effect sizes as jointly estimated by all effect sizes $K$ in the meta-analysis.

Such “borrowing of strength” means that the values of studies with extreme effects are pulled towards the mean (Lunn et al. 2012, chap. 10.1). This behavior is more pronounced for studies which provide relatively few information (i.e. studies with large standard errors).

\[\tag*{$\blacksquare$}\]

13.4 Questions & Answers

Test your knowledge!

What are the differences and similarities between the “conventional” random-effects model and a Bayesian hierarchical model? Name three advantages of Bayesian meta-analyses compared to their frequentist counterpart. Explain the difference between a weakly informative and uninformative prior. What is a Half-Cauchy distribution, and why is it useful for Bayesian meta-analysis? What is an ECDF, and how can it be used in Bayesian meta-analyses?

Answers to these questions are listed in Appendix A at the end of this book.

13.5 Summary

While meta-analysis is usually conducted using frequentist statistics, it is also possible to conduct Bayesian meta-analyses.

Bayesian meta-analysis is based on the Bayesian hierarchical model. The core tenets of this model are identical to the “conventional” random-effects model. The difference, however, is that (informative, weakly informative or uninformative) prior distributions are assumed for $\mu$ and $\tau^2$.

For Bayesian meta-analysis models, it is usually a good idea to assume weakly informative priors. Weakly informative priors are used to represent a weak belief that some values of more credible than others.

To specify the prior distribution for the between-study heterogeneity variance $\tau^2$, the Half-Cauchy distribution can be used. Half-Cauchy distributions are particularly suited for this task because they are only defined for positive values, and possess heavier tails. This can be used to represent that very high values of $\tau^2$ are less likely, but still very much possible.

When fitting Bayesian meta-analysis models, it is important to (1) always check if the model included enough iterations to converge (for example by checking the $\hat{R}$ values), and to (2) conduct sensitivity analyses with different prior specifications to evaluate the impact on the results.

【本文地址】

Chapter 13 Bayesian Meta

Chapter 13 Bayesian Meta

今日新闻

推荐新闻