Prevalence , statistical thresholds , and accuracy assessment for species distribution models

Abstract. For species distribution models, species frequency is termed prevalence and prevalence in samples should be similar to natural species prevalence, for unbiased samples. However, modelers commonly adjust sampling prevalence, producing a modeling prevalence that has a different frequency of occurrences than sampling prevalence. The separate effects of (1) use of sampling prevalence compared to adjusted modeling prevalence and (2) modifications necessary in thresholds, which convert continuous probabilities to discrete presence or absence predictions, to account for prevalence, are unresolved issues. We examined effects of prevalence and thresholds and two types of pseudoabsences on model accuracy. Use of sampling prevalence produced similar models compared to use of adjusted modeling prevalences. Mean correlation between predicted probabilities of the least (0.33) and greatest modeling prevalence (0.83) was 0.86. Mean predicted probability values increased with increasing prevalence; therefore, unlike constant thresholds, varying threshold to match prevalence values was effective in holding true positive rate, true negative rate, and species prediction areas relatively constant for every modeling prevalence. The area under the curve (AUC) values appeared to be as informative as sensitivity and specificity, when using surveyed pseudoabsences as absent cases, but when the entire study area was coded, AUC values reflected the area of predicted presence as absent. Less frequent species had greater AUC values when pseudoabsences represented the study background. Modeling prevalence had a mild impact on species distribution models and accuracy assessment metrics when threshold varied with prevalence. Misinterpretation of AUC values is possible when AUC values are based on background absences, which correlate with frequency of species.


Introduction
Species distribution models predict the occurrence probability of species in space based on limited known locations where species are present.Although modeling with sampling prevalence, or frequency of samples that contain the species, may be an appropriate approach (Real et al., 2006;Albert and Thuiller, 2008;Ward et al., 2009;Li et al., 2011;Meynard and Kaplan, 2012), there also are potential limitations.Sampling prevalence may not reflect natural species prevalence, or frequency of the species over the entire study extent.That is, sampling prevalence observed in samples may be biased because for example, frequency may vary throughout the study area.Indeed, the study area often is a subsample of the entire distribution range and thus, artificially defined by the modeler.In some cases, to maintain sampling prevalence it may be necessary to retain fewer samples, thus reducing sample size and accuracy of models (Hanberry et al., 2012a).Furthermore, if background data represent samples that do not contain the species, then prevalence will vary depending on resolution (i.e., number of samples) of the background data (Franklin et al., 2010).Lastly, adjustment from sampled prevalence that varies by species to a consistent modeling prevalence applied to all species may be necessary to standardize comparisons of model predictions among species or study areas or to map model predictions in similar and conventional classification units, such as equal intervals.
Modeling prevalence is the frequency of occurrence specifically selected for model training (Jiménez-Valverde et al., 2008).The most common recommendation is a balanced Published by Copernicus Publications on behalf of the European Ecological Federation (EEF).
B. B. Hanberry and H. S. He: Prevalence, statistical thresholds, and accuracy assessment modeling prevalence of 0.50 (McPherson et al., 2004;Liu et al., 2005), however researchers have suggested unbalanced modeling prevalence, such as 0.01 when sample sizes are limited (Lobo and Tognelli, 2011).Although adjustment of prevalence provides modeling flexibility, bias may reduce accuracy of models based on adjusted modeling prevalence more than sampling prevalence, which differ less from natural species prevalence.
Thresholds, defined by the researcher, specify the predicted probability above which a species is determined present.Often a threshold of 0.5 simply is used as a threshold for species presence, but may not to be the best choice (Liu et al., 2005;Freeman and Moisen, 2008).Liu et al. (2005) compared 12 different threshold selection methods and determined five methods that worked better than others to maximize both sensitivity (true positive rate) and specificity (true negative rate).One method was the use of modeling prevalence as the threshold.Another method was the use of mean predicted probability for the species as the threshold, but values a little lower than the mean also would be more likely to include the species than other threshold values.Three of the best five methods required sensitivity and specificity values, but specificity depends on known absences, which are uncertain for most datasets.
The most common accuracy assessment metric currently is the area under the curve (AUC) of the receiver operating characteristic (ROC), which plots true positive rate (sensitivity) against false positive rate (1-specificity; commission error).The AUC values are threshold-independent, which may not necessarily be a beneficial attribute if (1) the accuracy metric is meant to measure presence and absence at a specified threshold dividing species presence from absence, (2) omission and commission errors are not equal in importance, and (3) there is inclusion of irrelevant prediction ranges (Lobo et al., 2007;Jiménez-Valverde, 2011).Furthermore, AUC values appear to be sensitive to the extent and location of species distributions (Elith et al., 2006;Termansen, 2006;Lobo et al., 2007;Raes and ter Steege, 2007).Pseudoabsences replace absences for most datasets, introducing uncertainty (Jiménez-Valverde, 2011).Therefore, researchers must reinterpret the AUC curve in the context of pseudoabsences selected from the background (Peterson et al., 2008;Jiménez-Valverde, 2011) or select pseudoabsences from surveyed plots rather than pseudoabsences from the unknown background extent (Hanberry et al., 2012b).Another option is to use the area of predicted presence as a proxy for commission errors (Engler et al., 2004;Hernandez et al., 2006) to match researcher-specified thresholds of species presence.
Modeling prevalence, along with the selected threshold, may affect some accuracy metrics, perhaps due to bias away from natural species prevalence (Manel et al., 2001;Jiménez-Valverde and Lobo, 2006).Contradiction about the effects and bias of prevalence (Fielding and Bell, 1997;Manel et al., 2001;McPherson et al., 2004;Jiménez-Valverde and Lobo, 2006;Meynard and Kaplan, 2012) may be due to use of alternative types of prevalence (i.e., sampling prevalence vs. adjusted modeling prevalence) and varying threshold designations and accuracy assessment metrics.Therefore, we explored some unresolved issues about species distribution models, that is, (1) whether sampled prevalence produces less biased models than adjusted modeling prevalence, (2) whether modeling prevalence should be balanced at 0.5 or unbalanced (i.e., sampling prevalence or an alternative modeling prevalence), (3) whether threshold selection alone can account for sampling and adjusted modeling prevalence, and (4) the effects of prevalence, thresholds, and pseudoabsence type on accuracy assessment.It was not possible to maintain the sampling prevalence and keep a constant sample size of present cases; however, models were weighted averages of numerous models (e.g., ensemble trees in random forests) and the cumulative model had access to a greater modeling sample of present and absent cases.We adjusted modeling prevalence from 0.33 to 0.83, while keeping sample size of present cases constant.To examine effect of thresholds, we used a constant threshold of 0.50 and thresholds equal to prevalence.We based accuracy assessment on true positive rates, AUC values and true negative rates using surveyed and background pseudoabsences, and areal extent of predicted species presence.

Tree surveys
The USDA Forest Service Forest Inventory and Analysis (FIA) surveys fixed plots (composed of four subplots that are each 7.3 m in radius), which are located uniformly across the landscape, during a five year cycle.The latest complete cycle for Minnesota's Laurentian Mixed Forest (Fig. 1) was during 2004-2008 and contained 2666 plots.Because the available FIA plot locations are perturbed to protect landowner privacy, the USDA Forest Service joined a set of predictor variables (described below) to plots to provide a table without revealing locations but based on accurate spatial locations for modeling and prediction.

Spatial units and environmental variables
The spatial mapping units or grain were Soil Survey Geographic (SSURGO) Database (Natural Resources Conservation Service, http://soildatamart.nrcs.usda.gov)polygons.Soil surveys have not been completed in Cook, Crow Wing, Isanti, Koochiching, Lake, Pine, and St. Louis counties, leaving a study extent of about 4 895 238 ha (Fig. 1).After removal of polygons that were water or otherwise miscellaneous areas (e.g., mines, pits, dumps), there were 310 000 polygons with a mean polygon area of 16 ha (SD = 92).

Statistical analyses and prevalence
We applied random forests (Breiman, 2001;Cutler et al., 2007), a classification method based on bootstrap aggregation (bagging) by the majority vote of many trees grown using random samples of both predictor variables and modeling data.We used the randomForest package (Liaw and Wiener, 2002) in R statistical software (R Development Core Team, 2010).We set the number of classification trees at 2000 and the number of variables randomly sampled at each split as the square root of the number of predictors.
We randomly selected a set containing 66 % of plots, to a maximum of 2500, which contained each species for modeling, reserving the rest for accuracy assessment.For pseudoabsences, we selected up to 2500 plots that did not contain the species.Because random forests classification is an ensemble method that averages many trees, we worked around the limitations of sampled prevalence by allowing each classification tree iteration to draw from the complete modeling set.For sampled prevalence, we maintained the sampling prevalence for each species of 0.09-0.69,but we adjusted modeling prevalence with the sample size option (which is sampled without replacement).We varied the prevalence, using a moderate range from 0.33 to 0.83, representing ratios of pseudoabsence cases to present cases of 2 : 1 (prevalence of 0.33), 1 : 1 (0.50), 1 : 2 (0.67), 1 : 3 (0.75), 1 : 4 (0.8), and 1 : 5 (0.83).We held the present case sample size at 335 while varying the total sample size from 400 to 1000, i.e., we set the modeling prevalence at (1) 335 presences/1000 total cases (0.33 modeling prevalence), (2) 335 presences/700 total cases (0.50 modeling prevalence), (3) 335 presences/500 total cases (0.67 modeling prevalence), (4) 335 presences/446 total cases (0.75 modeling prevalence), (5) 335 presences/419 total cases (0.8 modeling prevalence), and (6) 335 presences/400 total cases (0.83 modeling prevalence).It was not possible to achieve a sample size of 335 present cases and maintain the sampling prevalence.Instead, sample sizes of present cases ranged from 164 to 1236 with a mean of 504 samples.

Comparisons, thresholds, and accuracy assessment
We compared predicted probabilities for the same species for each prevalence using correlation (Proc Corr; SAS software, Version 9.2, Cary, North Carolina, USA).We also determined the mean of predicted probabilities for each prevalence.For thresholds to accept presence of a species, we used a constant 0.50 threshold and also thresholds that matched the modeling prevalence.We computed true positive rate, AUC, and true negative rate values (ROCR package in R; Sing et al., 2005) using reserved present samples and surveyed pseudoabsences from plots that did not have records of the species and also background pseudoabsences where the entire study extent was coded as absent (but we only modeled using surveyed pseudoabsences).We calculated area predicted as present and converted area from an absolute value to the fraction of the total area (4 895 238 ha).We examined the relationship between area predicted as present and true negative rate and AUC values (Proc Reg; SAS software, Version 9.2, Cary, North Carolina, USA).

Results
Independent of sampling prevalence and thresholds, correlations indicated that the relationship of the predicted probabilities differed only slightly with modeling prevalence values.Correlation values among models based on modeling prevalence (all species combined) at a constant sample size for present cases ranged from 0.86 (correlation between the most distant modeling prevalences of 0.33 and 0.83) to 1.00 (correlation between the closest modeling prevalences; Table 1).Models based on sampling prevalences were slightly less similar to models based on modeling prevalences, in part due to varying sample size.Correlation values ranged from 0.96 at the lower modeling prevalences to 0.83 at the greatest modeling prevalence.In addition, predicted probabilities increased with modeling prevalence.Mean predicted probabilities for all species combined increased from 0.25 for the modeling prevalence of 0.33 to 0.60 for the modeling prevalence of 0.83 (Table 2).
Because predicted probabilities varied with prevalence, thresholds modified the trade-off between true positive rate and true negative rate due to prevalence, as balance in commission (false positive rate) and omission (false negative rate) errors changed (Fig. 2; mean values for all species com-bined).As prevalence (and predicted probabilities) increased and the threshold stayed constant, mean true positive rate and species prediction area increased while true negative rate decreased, creating large differences in values among prevalence ratios.Changing the threshold to match the modeling prevalence resulted in relatively similar true positive rate, true negative rate, and area for all prevalence values (Fig. 2).Models based on sampling prevalence performed similarly to models based on adjusted modeling prevalence.
Mean AUC values (all species combined) for the modeling prevalence models ranged from 0.92 to 0.94 using surveyed pseudoabsences (Table 2).The AUC values reflected true positive and true negative rates at the prevalence threshold (R 2 = 0.97), with a slightly greater influence by true negative rate (R 2 = 0.90) compared to true positive rate (R 2 = 0.79).Area was not correlated with AUC values (R 2 = 0.07) and true negative rate at the prevalence threshold (R 2 = 0.06).When using the background to provide pseudoabsences, AUC and true negative rate values were lower (Table 2).Area explained more of the variance in AUC values (R 2 = 0.35) and true negative rate at the prevalence threshold (R 2 = 0.41).

Discussion
There appears to be only a minor influence by prevalence, whether based on sampling or adjusted for modeling, on species distribution models for common species (and for uncommon species, Jiménez-Valverde et al., 2009).Models with a prevalence ratio ranging from (lower than) 0.33 to 0.83 will be highly correlated.Modeling prevalence therefore does not need to be balanced (as recommended by McPherson et al., 2004;Liu et al., 2005).However, better models may result from increased representation by the present case, rather than models with increased representation of the unknown case.
Threshold selection alone can account for changing values of prediction probabilities in species distribution models due to modeling prevalence.Using a constant threshold for different prevalence values clearly will affect accuracy metrics by minimizing either omission (at greater prevalence) or commission (at lower prevalence) errors.Retaining a threshold that is similar to prevalence will maintain fairly constant error rates, no matter the selected prevalence, similar to findings reported by Liu et al. (2005).
Summary statistics and thresholds provide ways to assess species distribution models.True positive rate, true negative rate, and AUC values are basic measures of whether the model is able to predict presence and absence of species.One AUC value may be more helpful than two summarized values of true positive rate and true negative rate, even though for most datasets, true absent cases are uncertain.Any research that uses AUC (or true negative rate) values must adjust commission error so that it does not reflect merely frequency of  species when the background is coded as absent.If the background is coded as absent, AUC values indicate whether a species is widespread or restricted in range within the study extent; models for species with small or restricted ranges will have greater AUC values due to the match between low predicted probabilities and background areas scored as absent (Elith et al., 2006;Stokland et al., 2011).When we coded the background rather than surveyed areas as absent for assessment, albeit using surveyed pseudoabsences in modeling, the relationship between AUC values and area predicted as present increased.That is, R 2 increased from 0.07 to 0.35 in our study because R 2 values between true negative rate and area increased from 0.06 to 0.41.Additionally, it appeared that no matter the extent of commission error, if the true positive rate was high then the AUC value also will be high (Wisz et al., 2008).
Greater predicted probabilities should be meaningful indicators about the probability of species presence, however it is important to realize factors can increase and decrease predicted probabilities.Increasing modeling prevalence increased predicted probabilities.Even though predicted probability values are consequential for a species, it is not certain what the predicted probabilities mean in terms of frequency, particularly when compared to other species (Lobo et al., 2007).For some conservation or monitoring goals, it may be useful to select areas where the species is very likely to be present at the expense of excluding potential areas of presence.Under those circumstances, setting the threshold at the mean predicted probability for known presences will target locations with greater probabilities of presence.

Conclusions
Modeling prevalence had a mild impact on species distribution models if thresholds for accepting species presence varied with prevalence and sample size was removed from modeling prevalence.We did not examine the effects of changing modeling prevalence for other statistical methods, and effects may differ if the method is not an ensemble www.web-ecol.net/13/13/2013/Web Ecol., 13, 13-19, 2013 method.However, Liu et al. (2005) had similar results for threshold selection using artificial neural networks as the statistical method.Modelers certainly should (1) specify the modeling prevalence and threshold for species presence and accuracy assessment and (2) standardize threshold values among modeled species so that models have similar balance between omission and commission errors.The AUC (and true negative rate) values appeared to be meaningful as measures of whether models produce greater predicted probabilities where a species was present than where it was unknown (Phillips et al., 2006).However, interpretation is misleading because with known commission error of scoring the entire background as absent, AUC (and true negative rate) values reflected the area predicted as present, that is, the frequency of species, rather than simply error.Any AUC values that trend inversely with extent of species distributions may be artifacts of pseudoabsence selection.
Edited by: M. Bezemer Reviewed by: G. Hengeveld and one anonymous referee

Figure 1 .
Figure 1.Study area (shaded black; contains soils surveys) in Minnesota's Laurentian Mixed Forest (shaded black and grey).

Figure 2 .
Figure2.True positive rate, true negative rate, and area (fraction of total area) at 0.50 thresholds and prevalence thresholds.The prevalence threshold maintained true positive rate, true negative rate, and area at a relatively constant value among modeling prevalence ratios.Sampling prevalence (points located at the 0.28 x-axis, the mean sampling prevalence) did not affect performance.

Table 1 .
Correlation (all species combined) among predicted probabilities for varying modeling prevalences at a constant sample size of present cases and sampling prevalence (mean sampling prevalence = 0.28, range = 0.09-0.69).

Table 2 .
Mean values for predicted probability, true positive rate, AUC, and true negative rate for varying modeling prevalences and sampling prevalence (mean sampling prevalence = 0.28, range = 0.09-0.69).