Resolving the species pool dependence of beta-diversity using coverage-based rarefaction

1. Biodiversity is non-randomly distributed in space and understanding how spatial structure of species diversity responds to ecological, biogeographic and anthropogenic drivers is one of the major quests of modern ecology. However, metrics of community differentiation such as Whittaker’s beta-diversity fail to unambiguously capture species turnover when they are compared across assemblages with changing species pool sizes, species abundance distributions and numbers of individuals. This is due to an effect of changing sample completeness. 2. Here, we propose a method that combines the benefits of individual-based and coverage based rarefaction to standardize beta-diversity estimates by sample completeness at the gamma-scale. We test the performance of our metric, , using simulations of spatially explicit communities that change both species pool size and intraspecific spatial aggregation. Additionally, we re-analyse the “Gentry” plot dataset to investigate patterns of tree beta-diversity along a latitudinal gradient.


Introduction
Organisms are non-randomly distributed across the globe and understanding spatial patterns of species diversity and composition from ecological samples is one of the most central questions driving ecological research (Gaston, 2000;McGill, 2011a;Worm and Tittensor 2018). A fundamental feature of biodiversity that makes it different from other response variables in ecology is that it is scale-dependent (Rosenzweig, 1995;McGill 2011b;Chase and Knight 2013). That is, local biodiversity increases at a decelerating rate with increasing spatial scale (e.g., the species-area relationship), and therefore many patterns of biodiversity depend critically on scale (e.g., Rahbek 2005; Harrison et al. 2006;Chase et al., 2018). As a result of the growing awareness of the importance of spatial scale in biodiversity measurements and interpretations of possible mechanisms, there has been a steady increase in interest in measuring and comparing spatial beta-diversity (Anderson et al. 2011;De Cáceres et al. 2012, Sreekar et al., 2018Xing & He 2019). Beta-diversity, as first defined by Whittaker (1960Whittaker ( , 1972, offers a mathematical link between local (i.e. alpha) and regional (i.e. gamma) species diversity, and quantifies how species composition varies from site to site (i.e., species turnover) within a region. In addition to providing important insights regarding how biodiversity scales with area, spatial beta-diversity can, for example, shed light onto the metacommunity processes that shape biological assemblages (Chase & Myers, 2011), inform biodiversity conservation (Socolar, Gilroy, Kunin & Edwards, 2016), and help understand the provisioning of ecosystem functions and services (Mori, Isbell, & Seidl, 2018).
Although beta-diversity is conceptually quite important, in practice, its measurement has been highly ambiguous. First, there are multiple metrics and approaches for measuring beta-diversity that are often used interchangeably, despite their sensitivity to very different aspects of community differences (Legendre, Borcard & Peres-Neto, 2005;Baselga, 2010;Tuomisto 2010a, b;Anderson et al., 2011;Chao & Chiu, 2016). Second, meaningful inference from changes of beta-diversity is often complicated due to the entanglement of multiple components of species diversity that collectively influence sample-derived estimates of beta-diversity. Beta-diversity is commonly thought to represent non-random spatial structure in species diversity (i.e. intraspecific spatial aggregation or species turnover in space). But it also is inherently a bridge between the local and regional scales, and as such it is unavoidably intertwined with the nature of biodiversity at these scales. For example, like virtually all diversity metrics, it responds to the size of the regional species pool, the shape of the regional species abundance distribution (SAD), and the number of individuals captured by the samples (McGill, 2011, Chase & Knight, 2013. Here, we argue that these non-spatial components of beta-diversity confound beta-diversity through their effects on sample completeness (Colwell & Coddington, 1994). For instance, regions with very large species pools typically exhibit high beta-diversity simply because local samples capture a small and incomplete portion of their diversity, which can lead to strong, but random, differentiation among samples . Likewise, undersampling can inflate beta-diversity when there are many undetected rare species in an assemblage (e.g., when species evenness is very low) or when the sample size is very small (i.e., few individuals are captured by a sample) (Barwell, Isaac & Kunin 2015).
Although such sampling effects are ubiquitous in ecological studies (Colwell & Coddington, 1994;Gotelli & Colwell, 2001), they are not well addressed with respect to beta-diversity (Wolda 1981;Beck, Holloway, & Schwanghart, 2013, Chao & Chiu, 2016. There have been attempts to develop beta-diversity indices that account for unseen species (Chao, Chazdon, Colwell, & Shen, 2005). However, evidence for their robustness has been questioned using simulated and empirical data (Beck Holloway, & Schwanghart, 2013;Cardoso, Borges, & Veech, 2009). While single-scale diversity estimates can be standardized using coverage-based rarefaction which accounts for differences in species pool size (e.g., Chao and Jost 2012), this technique has not been applied to beta-diversity. Instead, empirical studies often adopt null model approaches to correct for species pool effects (and associated levels of undersampling), when evaluating betadiversity patterns . The intent of such null models is to disentangle how much of the observed beta-diversity is due to non-random patterns of species distributions (i.e., patterns of intraspecific aggregation). While useful, for example, in disentangling the drivers of beta-diversity along large-scale gradients in gamma diversity (e.g., Kraft et al., 2011;Myers et al., 2013;Stegen et al. 2013, Xu, Chen, Liu & Ma, 2015, there is debate about how to best generate the null distributions for these models. For example, null distributions can be generated maintaining alpha diversity, gamma diversity, abundance structures and/or numbers of individuals (Mori, Fujii, Kitagawa & Koide, 2015;Qian, Wang & Zhang, 2012;Kraft et al., 2012;Brocklehurst, Day & Fröbisch, 2018). Although there is an emerging consensus that abundance-based null models are superior to incidence-based ones (Tucker et al., 2016), these approaches have been criticized because they ignore the completeness of the samples (Ulrich et al. 2017, Sreekar et al. 2018).
Here, we develop a rarefaction approach that is conceptually similar to an abundance-based null model in order to standardize beta-diversity estimates by the completeness of the gamma scale sample. To achieve this, we build on our previous development of beta-diversity within the Measurements of Biodiversity ("MoB") framework (Chase at el, 2018;McGlinn et al., 2019), and combine it with coverage-based rarefaction (Chao and Jost 2012). In the MoB framework, which itself is built upon previous work with individual-based rarefaction curves (e.g., Gotelli and Colwell 2001), we compare diversity values from individual-based rarefaction curves taken from smaller (local) and larger (regional) scale samples to estimate the degree to which species are aggregated (clumped) in the landscape as a measure of beta-diversity (see also Olszewski 2004, Dauby andHardy 2012). While the resulting metrics of beta-diversity are useful for comparing patterns of intraspecific aggregation within a given regional species pool (i.e., comparing observational or experimental results from a single region), we will show that they still retain an undersampling bias when used to compare across species pools of different sizes. As a solution, we show that a coverage-based beta-diversity metric enables comparisons of the magnitude of intraspecific aggregation notwithstanding changes in the species pool. We test our method on simulated spatial point patterns with different degrees of spatial structure (intraspecific spatial aggregation) and varying species pool sizes, and apply it to a case study of an empirical dataset of trees along a latitudinal gradient.
Calculating beta-diversity from rarefaction curves Individual-based rarefaction (IBR) is a common tool to standardize species richness estimates from different sample sizes (Hurlbert, 1971;Gotelli & Colwell, 2001). Furthermore, IBR can also be used to examine beta-diversity by comparing curves estimated from smaller (alpha diversity) samples nested within larger (gamma diversity) samples (Olszewski 2004, Dauby and Hardy 2012, McGlinn et al. 2019. IBR curves estimate the non-linear scaling relationship between the number of individuals in a sample and species richness (i.e., rarefied richness). The shape of the curve is determined by the SAD (i.e., species pool size and evenness), and the slope at any point along the curve is related to the sample completeness corresponding to the sample size at that point. For a given sample, the IBR curve can be calculated using the following formula derived from the hypergeometric probability distribution (Hurlbert, 1971): S n is the rarefied richness, or the expected number of species found in a subsample of n individuals (n<N). S obs is the observed total number of species, N is the total number of individuals in the sample and X i is the number of individuals of the i th species.
By constructing IBR curves at two or more nested spatial scales, we can assess intraspecific spatial aggregation and beta-diversity. Like most classical approaches to beta-diversity, we define alpha as the mean diversity of the sample scale or the scale of a localized subset of samples, and gamma as the total diversity of multiple samples or local subsets of samples (ref). Accordingly, for a given assemblage with spatially replicated samples, the alpha scale IBR curve is made up of the mean S n values across all samples (ߙܵ ) and the gamma scale curve consists of the S n values of the pooled samples (ߛܵ ). While the alpha scale is affected by spatial turnover within the assemblage, the gamma scale breaks up any lower scale spatial structure by randomly accumulating individuals from all the samples. If species are distributed randomly among samples (i.e., no aggregation), the alpha and gamma scale IBR curves sit on top of each other. Downward and upward deviations of the alpha scale curve, then, would be interpreted as intraspecific aggregation and overdispersion, respectively . The gamma scale IBR is conceptually very similar to abundance based null models (e.g. , but it uses an analytical formula for rarefaction rather than a shuffling algorithm. Using IBR curves, Figure 1 illustrates how beta-diversity of a reference assemblage (Fig. 1A) responds to changes in the size of the species pool (Fig. 1B), the numbers of individuals (Fig.  1C) and intraspecific spatial aggregation (Fig. 1D). Traditional (i.e., Whittaker 1972) ) is represented as the height ratio of the two curves at the respective right-hand end of the curves (dashed horizontal lines). In each of the four examples, there is beta-diversity, but for very different underlying reasons. Only the assemblage underlying Fig. 1D exhibits the sort of spatial turnover in species composition (due to aggregation) that we usually envision as an underlying mechanism for beta-diversity. In the other cases (A-C), beta-diversity only emerges due to sampling (i.e., a more-individuals effect between alpha and gamma scale). In order to evaluate the degree of intraspecific aggregation in a region, Chase et al. (2018) recommend that beta-diversity should be calculated from alpha and gamma scale richness values rarefied to a common number of individuals rather than from the observed species richness. The resulting ratio, termed ߚ ௌ , quantifies the deviation of the alpha curve from the gamma curve at a common number of individuals (n) as: Where n corresponds to the number of individuals observed at the alpha scale.

ߚ ௌ
is related to the metric developed by Olszewski 2004 (see also Dauby and Hardy 2012), who compared the differences of these curves at their base, rather than at n individuals as we have done here. ߚ ௌ is also conceptually similar to abundance-based null models that tease apart the influence of aggregation from sampling effects in determining patterns of beta-diversity when comparing among sites within a regional species pool (e.g., Chase 2010, Myers et al., 2015. When assemblages have a random spatial structure, ߚ ௌ is expected to equal 1 regardless of species pool and sample size (Fig 1 A-C). Conversely, values larger than 1 reflect spatial aggregation or species turnover among sites in the region (Fig 1 D). While is a useful metric for estimating the degree of aggregation among treatments or observations within a given species pool, it suffers from the same bias associated with sampling that influences most measures of biodiversity. Thus, as we will illustrate below, is less useful in comparing the degree of aggregation among regions where the size of the regional species pool changes (e.g., along biogeographical gradients).
The deviation between alpha and gamma scale IBR curves is due to spatial structure (i.e.

≠ 1 )
. However, the magnitude of this deviation is contingent on the size of the species pool. To visualize this interaction, consider two assemblages each composed of two patches, but which differ in the size of their regional species pool (500 vs 100 species Fig. 2A and 2B, respectively). Supposing that both assemblages have complete species turnover between their respective patches (i.e., species are maximally aggregated at the patch scale), Figure 2 shows the IBR 6 cal D ns ed -50 R or ng ful es ≠ To ch y). ve R curves that we would expect to find if we sampled 500 individuals from each patch in the large ( Fig. 2A) and small (Fig. 2B) species pools. While both assemblages are maximally and equally structured at the patch scale, is substantially higher in the community with the small species pool than in the community with the large species pool. This is because for equal sample sizes (i.e., equal numbers of individuals), samples from a large species pool are less complete than samples from a small species pool (i.e., further from a hypothetical asymptote; Chao and Jost, 2012). Thus, there is a downward bias in that increases with decreasing completeness of the gamma-scale estimate (Fig. 2C).

Method
We propose an approach that accounts for the sample completeness of an assemblage, in order to evaluate the degree of aggregation leading to beta-diversity when species pool size differs. Following Chase et al. (2018), we use IBR within assemblages to standardize alpha and gamma diversity estimates to a common number of individuals n. This removes the influence of the more-individuals effect influencing beta-diversity that comes about due to the difference in sample size between the alpha and gamma scales. However, because we are interested in cases where the species pool size varies, rather than keeping sample-size constant across assemblages (via n), we instead maintain a consistent sample coverage at the gamma scale. Sample coverage is a measure of sample completeness that ranges from 0 to 1 and refers to the "proportion of the total number of individuals in a community that belong to the species  (Chao & Jost, 2012). Coverage can be estimated from the number of rare species in a sample (Good, 1953;Chao & Shen, 2010). As it is directly related to the slope of the IBR curve, it can also be interpolated to any smaller than observed sample size, n, using the following equation (Chao & Jost, 2012) : C n is the expected coverage for a subsample of sample size n. N is the total number of individuals in the sample and X i is the number of individuals of the i th species. Eqn 3 can be used to assign coverage estimates to any sample size used in IBR, which is the basis for coverage-based rarefaction. For coverage-based standardization, one has to rearrange Eqn 3 to determine how many individuals (n) are necessary to obtain a given target coverage C target (usually the smallest coverage of all communities). This is computed numerically by calculating C n for every possible n between 1 and N, and choosing the n that minimises the difference between C n and C target . Subsequently, IBR is applied to standardize the diversity estimate to a sample size of n and thus the desired coverage level C target (e.g. Hsieh, Ma, & Chao, 2016).
Here, we adopt the same procedure to standardize beta-diversity. Figure 3 illustrates the two-scale IBR curves corresponding to hypothetical large (Fig. 3A) and small (Fig. 3B) species pools. Again we quantify the deviation between gamma and alpha scale as ߚ ௌ at a common n. However, we allow n to vary between the two scenarios to maintain a constant gamma-scale coverage (indicated by the slope of the tangential lines). The resulting pair of ߚ ௌ values are very similar (compare with Fig. 2), which represents the fact that both scenarios are equally aggregated at the patch scale. The advantage of matching ߚ ௌ values by gamma scale coverage becomes particularly clear when we consider the entire scaling relationship of ߚ ௌ (Fig. 3 C). If we quantify ߚ ௌ for every possible value of n and plot them against gamma-scale expected coverage, the values from large and small species pool fall on approximately the same line and the species pool dependence vanishes. We implemented this procedure in an R package available at https://github.com/T-Engel/betaC. It provides the function "C_target" that can be used for steps 1.1 and 1.2. Furthermore, it contains the function "beta_C" that carries out step 2. These functions adopted code that was originally written by Hsieh et al. (2016) for the R package iNEXT.
Our approach can be applied to a wide range of ecological datasets where changes in spatial structure of species diversity and species pools are entangled along a common gradient. Like all IBR methods, our approach requires species abundance data (e.g., site by species matrices with abundance data). Every assemblage should have spatially replicated samples, so that one can define at least two nested sampling scales (alpha and gamma). We assume that the sampling design is standardized across all assemblages. This means there should be a consistent number of samples per assemblage and every sample should have the same effort (e.g., plot size, trap nights, etc.). Furthermore, we assume a consistent spatial extent. If the number of samples changes from one assemblage to the next, we suggest that users take a spatially constrained subset of the samples to keep extents as consistent as possible. Likewise, other deviations from the 'ideal' scenario, such as correcting for unequal sampling effort, can be accomplished prior to deriving the values described below. Similar to Whittaker's (1962) multiplicative beta, ߚ is calculated as a ratio of gamma over alpha. This makes ߚ more intuitive than metrics of beta-deviation that are calculated in null model approaches (e.g., Kraft et al. 2011) because it can be interpreted as an effective number of distinct communities (sensu, Jost, 2007).

Proof of concept using simulations
To test the performance of , we simulated spatially explicit assemblages that varied in the size of the species pool and the degree of intraspecific aggregation using multivariate spatial point patterns. We used the R package mobsim  to carry out the simulations. For each simulated assemblage, we first drew 4000 individuals from a lognormal SAD that was parameterized with a given species pool size and a coefficient of variation of 1. Then, we used the Thomas cluster process to distribute individuals in space, varying the degree of intraspecific spatial aggregation through the parameter for the number of conspecific clusters. The simulation was parameterized in a full-factorial design, where the species pool size encompassed every integer between 10 and 500, and the number of clusters was set to 1 (i.e., extreme intraspecific aggregation), 4, 10 and 20. We also implemented a random Poisson process to simulate completely random spatial distributions for species as an additional level of within species aggregation. Each combination of species pool and aggregation was replicated 3 times yielding a total of 7365 simulated communities. To sample from the regional communities, we placed 4 sample quadrats into each simulated community (Fig. 4 A, B) was 50. The R code for the simulation is available on github (https://github.com/T-Engel/betaC) and will be archived upon acceptance of this manuscript.
All three beta-diversity indices (Whittaker's ߚ , ߚ ௌ , a n d ߚ ) were influenced by intraspecific community aggregation. Additionally, Whittaker's ߚ and ߚ ௌ were affected by the changes in species pool size, whereas ߚ was insensitive to this factor. While both Whittaker's ߚ and ߚ ௌ responded to the degree of spatial aggregation and species pool, they did so in contrasting ways. For Whittaker's ߚ , sensitivity to the size of the species pool diminished with increasing aggregation; for ߚ ௌ , the effect of the species pool increased with increasing aggregation (Fig. 4  C). Only ߚ measures the spatial structure of the simulated communities independently of the species pool size, and is therefore recommendable for quantifying beta-diversity across regional species pools of varying sizes.

Empirical case study
We applied our approach to a dataset that has been used frequently to examine patterns of beta-diversity across biogeographic (specifically latitudinal) gradients-the Gentry Forest plot dataset (Gentry, 1988;Phillips & Miller, 2002). This dataset has been used to disentangle changes of beta-diversity and species pool using different null model approaches Qian & Song, 2012;Qian et al., 2013;Xu et al., 2015). We obtained the dataset through the "BIEN" library in R (Maitner et al., 2018). We restricted our analysis to the plots located in the Americas in order to focus explicitly on the influence of latitude rather than other biogeographic effects, and only included plots where data from 10 subplots were available and where each subplot contained at least 5 stems. This resulted in 140 plots with a latitudinal range from -40.717°S to 45.553°N; total species richness was strongly negatively correlated with latitude ranging from 12 to 257 (Fig. 5A).
We defined the plot scale (i.e., all 10 subplots) as the gamma scale, and two neighboring subplots as the alpha scale. Previous analyses have used one subplot as the alpha scale, but we used two to achieve a better sample completeness (nevertheless, our results were qualitatively comparable regardless of the alpha-scale we used; not shown). We computed Whittaker's = 0.14). First, as previously shown (e.g., Kraft et al. 2011), we found that Whittaker's beta-diversity declined with latitude (Fig. 5B). In addition to a potential signal of spatial structure, this effect includes the differences in sample size between alpha and gamma scale, as well as changes in species pool size and associated changes in sample completeness. When we control for number of individuals using ߚ ௌ , the trend is reversed and there is an increase in beta-diversity with increasing latitude (Fig. 5C). However, as we showed above, these results are still affected by the effect of the species pool and undersampling because sampling coverage among these plots differs. Fig. 5D shows the latitudinal gradient in coverage C n corresponding to the sample size of 12 that was used for the calculation of ߚ ௌ .
In some of the temperate plots, 12 stems can correspond to 75% and more of the species pool estimated from the sample, whereas in many of the species-rich tropical plots this sample size only captures as little as 5% of the species (Fig. 5D). ߚ , which controls for this difference in sample completeness, showed no change along the latitudinal gradient (Fig. 5E). We conclude that at the given scale there is no change of spatial aggregation along this gradient. But this does not mean that the spatial structure is random. As indicated by ߚ > 1, we find a signal for spatial aggregation across the gradient even at this small scale and sample completeness.
The target coverage of 14 % is very low for this dataset, which supports the argument that these plots are most likely too small (especially in the tropics) to appropriately compare beta-diversity trends (Tuomisto & Ruokolainen, 2012). Nevertheless, they still provide an important test case for the analyses we advocate here, particularly given the previous analyses of these data and the discussion regarding appropriate null models that ensued Qian & Song, 2012;Tuomisto & Ruokolainen, 2012;Kraft et al., 2012;Qian et al., 2013;Xu et al., 2015). Although we come to qualitatively similar results as the abundance-based null models of  and Xu et al. (2015), our method has the advantage of explicitly incorporating an estimate of sample completeness. This means that users are confronted with the quality of their data and have to interpret the results in the light of any limitations.

Discussion and conclusions
Adjusting sample size from one location to the other to achieve similar sample completeness is not a new idea for the estimation of beta-diversity, but there has been a lack of statistical methods to achieve this (Colwell and Coddington, 1994). Building on previous work using rarefaction and coverage-based tools (e.g., Colwell and Coddington, 1994;Chao and Jost, 2012;Chao et al., 2014;Chase et al., 2018;McGlinn et al., 2019), we provide an approach that standardizes beta-diversity by sample coverage to quantify the degree of spatial structuring (aggregation). Our simulations of spatially explicit assemblages show that the resulting metric, is influenced by changes in spatial structure, but remains unaffected by changes in species pool size. This allows for comparisons of spatial structure along large biogeographic gradients 14 eir al r's s) te < is al ng st, at ng ic, es ts with changing species pools. In our empirical case study, we show that spatial structure in tree communities did not change along a latitudinal gradient of forest plots.
Although we standardize by sample coverage, it is nonetheless important to ensure that the samples have somewhat meaningful and comparable completeness. As our empirical case study shows, this can be challenging when the differences in species pool size are very large and sample sizes (e.g., plot size) are relatively small. Within our method, the target coverage value used for the standardization is set by the site with the lowest gamma-scale completeness, which for the Gentry Forest Plot data presented above (Gentry 1988, Phillips & Miller, 2002 was 0.14, which is quite low. For some of the species poor temperate plots, this meant that sample size had to be scaled down to as little as one or two individuals (i.e., n j ) to achieve the same value of coverage. From a theoretical point of view, the n j value should be 2 or higher because only then can there be a deviation between alpha and gamma scale IBR curve; rarefying down to one individual always yields 1 species. As a result, many of the low diversity plots had to be excluded, as ߚ for this very low coverage was not defined. In a separate analysis, we increased the target coverage value to 50 %, which meant that many of the species rich plots had to be excluded because their sample coverage was not high enough (see Supporting Information S1). Having controlled for variation in sample completeness, we find no systematic change of spatial structure along the latitudinal gradient. Nonetheless, we emphasize that our empirical case study merely serves as a proof of concept, and we share the concern that due to the small plot size, these data have limited power to reveal any changes in assembly processes along the latitudinal gradient (Tuomisto & Ruokolainen, 2012). ߚ quantifies intraspecific spatial aggregation whilst controlling for the fraction of the estimated species pool in the samples (i.e., coverage). Others have suggested that beta-diversity should be calculated from only a subset of the most dominant species in a sample, but this approach still responds to changes in species pool size and the SAD (Roden et al., 2018). In contrast, our coverage-based method makes use of the SAD of the samples (i.e., the shape of the IBR curve) to make inferences for a constant fraction of the species pool. Ulrich et al. (2017) argue that null models cannot disentangle species pool and aggregation unless they incorporate external data on the sizes of the relevant species pools. However, we argue that the sample itself can provide an estimate of its completeness (Good, 1953, Chao andJost 2012). Nevertheless, coveragebased estimates, implicitly assume that there is an asymptote in species diversity (Chao and Jost, 2012). While this assumption is mathematically convenient and the asymptote can be approximated for a given sample, it cannot strictly be true (i.e., we will always find more species with more samples until the entire global pool is sampled; Williamson et al,. 2001). The other important assumption is that the gamma scale is a somewhat random subset of the species pool, which is probably not the case when there is spatial structure at larger scales and when the number of samples is relatively small. Nevertheless, as we presented it here, ߚ does not require estimation of the asymptote itself, but merely uses interpolation to develop a useful approximation of sample completeness. Our approach could be extended to encompass extrapolation, but this should be done with caution (Chao et al., 2014).
In conclusion, we have developed an approach to address species pool dependence and associated sampling effects in beta-diversity. Unlike other null model approaches, we explicitly treat species pool dependence as a problem of undersampling, and address it by standardizing beta-diversity by sample coverage. As a result, this approach allows deeper insights into the scale-dependent and multivariate nature of biodiversity patterns . Applications could, for example, shed light onto the assembly processes that govern biological (meta-)communities along biogeographic gradients or contribute to a better understanding of the scale-dependent biodiversity trends observed during the current biodiversity crisis.