An approach to the modeling of honey bee colonies

In this work, populations of adult and immature honeybees and their honey production are studied through mathematical and statistical modeling approaches. Those models are complementary and are presented in disjunct form. They were used to show different modeling methods for honey bee population dynamics. The statistical approach consisted of a generalized linear model using data from the Department of Agriculture of the United States of America (USDA), which showed that the relationship between the number of colonies and the rate of honey production is not constant in time but decrease over the years. These models showed that when a bee population is subjected to a stress factor (i.e., habitat destruction, Varroa mite, climate variability, season, neonicotinoids, among others), the abundance of individuals decreases over time as well as the honey produced by the colonies. Finally, the mathematical approach consisted of two models: (1) a smooth model, in which conditions of existence and stability of the equilibrium solutions are determined by an ecological threshold value, and (2) a non-smooth model where the mortality rate of bees is included as a function of the number of adult bees in the population.

8 J. P. Romero-Leiton et al.: An approach to the modeling of honey bee colonies 2018; Ulloa et al., 2010;Farouk et al., 2014). The utility of honey is diverse. For example, it works as an anti-flu agent (Farouk et al., 2014). Honey also serves as a natural source of antioxidants which are effective in reducing the risk of heart and immune system diseases, cataracts and various inflammatory symptoms (Vit, 2004). Regarding the honey trade, the United States of America is the world leader in imports. Honey imports increased at a rate of 6956 t yr −1 , making this country the main honey importer in the world (García, 2018). However, regarding production, China is the world leader (466.6 kt), followed by Turkey (94.7 kt), Iran (74.6 kt), Ukraine (73.7 kt) and the Russian Federation (68.4 kt) (Güngör, 2018).
Despite the great importance of bees in the world, an alarming concern has been raised in the last 20 years because of the decrease in the number of honey bees worldwide, which affects the quality of life of all human populations that depend (directly or indirectly) on them (Ratnieks and Carreck, 2010;Cox-Foster and VanEngelsdorp, 2009;Nates-Parra, 2016).
For these reasons, it is necessary to develop studies on honey bee population dynamics aiming to conserve and sustainably manage this species. Research in the area of mathematics and statistics has contributed to the development of models that allow us to understand the population dynamics of bee colonies (Belić et al., 1986;Schmickl and Crailsheim, 2008;Khoury et al., 2011;Russell et al., 2013;Jatulan et al., 2015;Dennis and Kemp, 2016;Booton et al., 2017). Schmickl and Crailsheim (2007) created a simple mathematical bee population model (HoPoMo) using differential equations to predict the population dynamics and resource fluctuations of a bee colony. Their model emphasized the importance of pollen supply (Bagheri and Mirzaie, 2019) and presented a mathematical model considering pollen and nectar as necessary foods for the colony. This model was developed based on the differential equations proposed by Khoury et al. (2013). Schmickl and Crailsheim (2007) built one of the most detailed population models of honey bee colony dynamics, consisting of 60 equations that track the life of a bee from an egg to an adult bee every day. The model considered the effect of seasonal changes in egg-laying rate, nurse bees on larval survival and pollen shortage on cannibalization. Russell et al. (2013) developed a model that focused on how internal demographic processes within a colony interact with food availability and brood rearing to alter growth in forager population size. The model was implemented as a series of differential equations based on the rate equations from the analytical models proposed by Khoury et al. (2013).
In this work, we describe from mathematical and statistical perspectives the relationship between the population size of honey bees (Apis mellifera) and honey production when the bee colony is subjected to some stress factor (habitat destruction, climatic variability, neonicotinoids, etc.) that externally causes the death of individuals and consequently the possible decrease in honey production. We present three different approaches. Two of them are purely mathematical and consider the interaction between immature and adult bees and the amount of honey produced using ordinary differential equations (ODEs).
First, we consider a smooth model in which a stability analysis of the equilibrium solutions is performed. The stability results are validated with numerical experiments using data obtained from the literature. Next, we consider a nonsmooth model which is analyzed using Filippov's systems theory (Dieci and Lopez, 2009). Finally, a statistical model is developed considering honey production as the response variable, the number of colonies as a the main effect predictor, and year, latitude and area as covariates, using a generalized linear model (GLM).

The statistical model approach
For this section, we used the historical data of bee farming in the United States of America (USA) from 1985 to 2019, from the US Department of Agriculture (USDA), which are publicly available on USDA web page (https: //usda.library.cornell.edu/concern/publications/hd76s004z? locale=en&page=3#release-items, last access: August 2022). Additionally, for a detailed revision, the R codes and the cleaned data set are available at the following GitHub link: https://github.com/jpatirom3/Honey-bees-modeling-(last access: March 2022). These data sets summarize the statistics of honey production and number of bee colonies by state and year in the USA. We used these data sets because they offer the most complete, comprehensive and publicly available data on honey production in the world. After checking the data sets for missing data, we decided to use the data from 39 states, which had complete values for all years.
A first exploration of these data sets revealed a significant decreasing trend in honey production over the years (Fig. 1), which is an expected pattern due to the worldwide decrease of working honeybees during the last 20 years (Bretagnolle and Gaba, 2015). Since this kind of temporal trend can inflate the relationship between responses and predictors in statistical models (Wooldridge, 2015), we fitted a model for each of seven 5-year sub-series (1985-1989, 1990-1994, 1995-1999, 2000-2004, 2005-2009, 2010-2014 and 2015-2019), in such a way that the temporal trend within each sub-series was significantly reduced in comparison to the full 1986-2019 series. Thus, every model was able to be built on n = 194 data points, which is a reliable sample size for parameter estimation. This approach also allowed us to evaluate the temporal change of the parameters over time, which is important due to the already mentioned decline of bees.
Thus, our modeling approach considered honey production (HP) (pounds) as the response variable for each of the seven 5-year sub-series; number of colonies (NC) as the main effect predictor; and year (Y ), latitude (LAT) and area (AR) (hectares) of each state as covariates. Since NC and HP exhibited highly skewed distributions, we used an additive GLM log-log with a logged response and predictor in order to achieve normal and homoscedastic residuals (Quinn and Keough, 2002). Equation (1) shows the specification of this model, where e is an error term with Gaussian distribution and Id is an identity link function.
Estimation of β parameters was performed trough the iterative reweighted least squares method (IWLS) (Stirling, 1984). Different combinations of covariates were fitted in a top-down stepwise modeling approach, which started by fitting a saturated model with all covariates and ended up with a reduced final model with a minimum number of covariates that captured most of the variance in the response variable ( Table 1). The Akaike information criterion (AIC) and the root mean square error (RMSE) were used as metrics of performance and prediction accuracy to compare among models and define the final models for each year's sub-series. β 2 , of main interest for this section, was interpreted as a percentage unit change in HP caused by a percentage unit change in NC, while keeping the other covariates parameters constant. All these procedures were done in R studio (R Team, 2019). Table 1 shows the results of the statistical modeling processes in terms of analysis of deviance. Covariates in the final reduced models were Y and LAT. AR was dropped because AIC values did not decrease in any of the models that included it. Figure 2 shows the predicted values of the models as a function of NC and HP. In general, the performance and prediction accuracy of all models were satisfactory, since R 2 ranged between 60 % and 93 %. Parameter estimation showed that β 2 , of main interest for our study, reduced over the years from 1.14 ± (0.06) in 1985-1989 to 1.05 ± (0.02) in 2014-2019 ( Fig. 3) (values in brackets represent 95 % con- Table 1. Results of analysis of deviance for the statistical models fitted to the data for every 5-year sub-series. Residual deviance represents the degree in which the response variable can be predicted by model predictors. The higher the value, the better the model is able to make predictions. The F value is the rate of model variance; it provides a way to compare the fitted model against a model with only intercept and no predictors. Higher F values represent better fits of the data to the model. F values provide the significance of the effect of each predictor in the models. Values equal to or lower than 0.05 represent a significant effect. R 2 is the coefficient of determination; it represents the percentage of variance in the response variable that is explained by the fitted model. fidence intervals). This is a reduction of 8 % in the effect of NC on HP. These values can be interpreted in such a way that during 1985-1989, a 1 % increase in NC caused an increase of 1.1 % to 1.14 % in HP, whilst in 2014-2019, a 1 % increase in NC caused an increase of 1.03 % to 1.07 % in HP.

The smooth mathematical model approach
In this section we formulate a smooth mathematical model that describes the interaction among immature bees at time t(B(t)), adult bees at time t(T (t)) and the amount of honey produced at time t(M(t)). We assumed that immature bees grow at a rate β, proportionally to the number of adult bees. This is represented by term T /(T + ν), where ν is the average saturation rate (number of adult bees needed for immature bees to reach the half of its maximum number). The number of bees reaching the adult stage affects the number of immature bees. This is modeled by term ωB, where ω rep-  resents the maturation rate to the adult stage, and 1/ω represents the time spent before reaching the adult state. Additionally, the number of immature bees is reduced by natural mortality. This is modeled by term µ B B, where µ B represents the natural death rate of the immature stage. In this way, the following ODE is proposed to model the variation in the number of immature bees at time t: Similarly, we assumed that the number of adult bees decreases naturally. This is represented by term µ T T , where µ T is the natural death rate of the adult stage. Additionally, the bees can also die due to a stress factor. This was modeled by the term σ T , where σ is the death rate due to a stress factor (loss of habitat, pesticides, climate change, mismanagement by the beekeeper, among others) on bees at the adult stage. Thus, the number of adult bees at time t can be represented by the following ODE: Finally, the production of honey in the hives increases at a rate ρ, which depends on the number of adult bees. This is represented by the term T /(T + u), where u is the average saturation rate. The amount of produced honey decreases due to different factors; one of them is the loss of honey due to natural causes such as support for the hive and feeding of immature bees, which is represented by the term αM, where α is the rate of honey loss. Another cause is the consumption by adult bees, which is modeled by term δT M, where δ is the rate of honey consumption by adult bees. Therefore, the following ODE models the variation in the proportion of existing honey at time t: From Eqs.
(2)-(4) we obtain the following system of ODEs: Model (Eq. 5) can be represented in Fig. 4. The specifications of parameters involved in the model (Eq. 5) are shown in Table 2.
Let us define = min{µ B , µ T }. Then, the set of biological interest of the model (Eq. 5) is given by The following lemma ensures that set has biological sense; that is, all solutions starting in remain there for all t ≥ 0.
Lema 3.1. The set defined in Eq. (6) is positively invariant for the solutions of the system (Eq. 5).
The proof of the above lema can be seen in Appendix A.

Equilibrium points
For calculating the equilibrium points and stability of the model (Eq. 5), it should be clarified that all parameters given in Table 2 are strictly greater than zero. The equilibrium points of the system (Eq. 5) are given for the solution of the following system of algebraic equations Details about the solution of the previous system can be found in Appendix A. We get the following proposition. Proposition 3.2. Model (Eq. 5) always has a trivial equilib- where b * and m * are defined in Eqs. (A4) and (A5), respectively.

Local stability analysis
Here, we determine the conditions for the local stability of the equilibrium solutions given in Proposition 3.2. We use the linearization of the vector field given by the right-hand side of the system (Eq. 5), which is given by the following Jacobian matrix: The stability of each equilibrium point given on Proposition 3.2 can be found by evaluating the above matrix in each equilibrium point. To see the mathematical details see Appendix A. We obtain the following proposition. Proposition 3.3. If λ * < ν, the trivial equilibrium E 0 = (0, 0, 0) is locally and asymptotically stable, whereas if λ * > ν, it is unstable. If λ * = ν, E 0 is a non-hyperbolic equilibrium point. Similarly, If λ * > ν, the coexistence equilibrium E 1 = (b * , λ * − ν, m * ) is locally and asymptotically stable, whereas if λ * < ν, it is unstable. If λ * = ν, E 1 is a nonhyperbolic equilibrium point.
A summary of the results from Proposition 3.3 is given in Table 3. Remark 3.4. The existence and stability conditions of the equilibrium points of the system (Eq. 5) are given in terms of an ecological threshold λ * defined in Eq. (A3), which can be rewritten as Since µ T + σ represents the total death rate of adult bees and ω + µ B the total death rate of immature bees, the expression ω/µ T + σ can be interpreted as the proportion of immature bees that equals the number of adult bees when these remain constant. Similarly, β/ω + µ B is the proportion of adult bees that equals the number of immature bees when these remain constant. Thus, λ * is the total net number of bees produced by the colony during the lifetime of the hive. If the total death rate of adult bees is large enough, it would lead to the collapse of the hive because the term λ * would approach to zero. This is because the existence condition of the coexistence equilibrium states that λ * > ν, where the constant ν is the number of adult bees required for the growth rate of immature bees to reach half its maximum value. Average saturation rate population Table 3. Condition of the existence and stability of the equilibrium points of the model (Eq. 5).

Numerical simulations
In this section we present some numerical experiments that illustrate the behavior of the hive and honey subjected to different parameter values. Some values were taken from the literature and others were assumed. In Table 4 we show the rank of the parameter values included in the model and their respective references. In Fig. 5 we present the stability of the equilibrium E 0 , which is obtained with a higher value for the reproduction rate of immature bees. Here, σ = 0.8 (considered a high value), ρ = 0.13, µ B = 0.11, µ T = 0.29, ω = 0.95, α = 0.95, δ = 0.571, u = 1, ν = 1 and β = 0.92. With these values the number of bees and their honey will constantly decrease until a collapse. Figure 6 numerically represents the stability of the equilibrium E 1 . Here, the mortality rate due to a stress factor was reduced to an average value of σ = 0.4, and the honey production rate was increased to ρ = 0.23. The loss of honey due to natural causes has also increased to α = 0.1. By including the above changes in parameter values, the amounts of immature adults and honey production remain constant in B = 0.14, T = 0.2 and M = 0.289. However, this is less likely to occur because, due to multiple factors, the hives tend to collapse after a certain time.

The non-smooth mathematical model approach
Some mathematical models whose arguments are described by piecewise smooth functions have numerous dynamics and applications, as well as an extensive mathematical structure, even though their behavior is not simply described in terms of modern qualitative theory of dynamical systems. However, this classical theory of dynamic systems does not include a significant number of phenomena that occur in practice. These phenomena are defined with functions that are non-smooth in their arguments, such as electrical circuits in which there are commutations (switches), control systems, models in different sciences and population models in biology, where some continuous changes can generate discrete actions. Such is the case of bee populations, in which different stress factors can cause some thresholds of mortality and affect both the number of bees in the immature state, as well as in the adult state, and therefore honey production. Thus, in this section we use the Filippov systems (Dieci and Lopez, 2009) to study the system (Eq. 5), where R 1 and R 2 are considered as death thresholds. If the number T of adult bees is big enough from a R 2 , the death of bees due to a stress factor (σ ) will be maximum (σ max ). Similarly, from a R 1 , if the number of adult bees is small enough, the death rate due to a stress factor will be minimal (σ min ). Finally, if we consider that the number of adult bees is in equilibrium, that is between R 1 and R 2 , the death rate due to a stress factor will be stable (σ 1 ). The previous consideration can be mathematically written as follows: The main characteristic of the Filippov systems is the subdivision of the state space into disjoint subregions, and a smooth vector field is defined within each one of them (Amador et al., 2021). The limits between the different regions are called discontinuity surfaces, which are usually denoted as i (Hogan et al., 2016).
Mathematical details about this approach can be seen in Appendix B. However, this presents only a mathematical understanding of Filippov's systems; a biological sense of this will be determined in future work, as well as the study of the presence of pseudo-equilibrium points.

Numerical simulations
The numerical simulations of this section were performed using the MATLAB interface, and the codes can be consulted at the following GitHub link: https://github.com/jpatirom3/ Honey-bees-modeling-(last access: March 2022). Numerical simulations for the described systems are presented below. Figure 7 shows the trajectory of the solutions of the states B(t), T (t) and M(t). In this figure, we take B(0) = 1, T (0) = 1 and M(0) = 1 as initial conditions. Table 5 summarizes the values used for each parameter. Figure 8 shows the switching surfaces 1 (left side) and 2 (right side), which divide the first octant into the following regions: S 1 (left side of 1 ), S 2 (region between 1 and 2 ), and S 3 (right side of 2 ), as well as the phase portrait. The trajectories show that regardless of the initial conditions or the region of start, there is always a crossover in the switching planes, which is proven analytically with the Filippov analysis as shown above. After some time, the trajectories remain in regions S 2 and S 3 , which means that the number of adult bees is switching within these regions limited by the switching surface 2 . Finally, the number of adult bees seeks stability at the break-even point, as shown by Fig. 7.

Discussion
In this work, we attempted to contrast three different modeling approaches to understand the population dynamics of immature bees, adult bees and the honey produced in the hive. We first formulated a smooth mathematical model in which conditions of existence and stability of equilibrium solutions were found. Afterwards, we adapted the previous model to a non-smooth model which was studied using Filippov's systems theory. Finally, we analyzed a honey bee data set obtained from USDA, using a GLM. From the smooth model, we obtained two equilibrium points: (1) a coexistence between the number of bees and honey production and (2) a collapse in both the number of bees and honey production. An important aspect to highlight here is that the existence and stability of these equilibrium points are in terms of the ecological threshold λ * , given in Remark 3.4, which represents the net number of bees produced within the colony over the lifetime of the hive. Additionally, the numerical experiments of these two points suggested that a high death rate due to a stress factor (σ = 0.8), which was used in Fig. 5, leads to a collapse of the hive in 50 weeks.
Subsequently, a smooth model was proposed in which two death thresholds R 1 and R 2 were considered for the stress factor. The Filippov analysis for the non-smooth model showed that there is always a crossover in the switching surfaces. After 100 weeks, the trajectories switched between regions S 2 and S 3 , which defined a normal and high stress rate respectively, as it was shown in Fig. 8. Finally, the orbits stabilized at the equilibrium point located in region S 2 . We have taken into account the fact that the equilibrium point E 1 is stable, and, therefore, the equilibrium value is reached when t → ∞. Parameter   Table 5.
For the statistical approach, generalized linear models were built to study seven 5-year sub-series of colony abundance and honey production data (1985 to 2019) from the USDA. We observed that the number of colonies required to produce a same amount of honey is not constant but reduced over the years as shown by Fig. 3. This points to the influence of a stress factor, such as the change of natural habitats to mono-cultures (Belsky and Joshi, 2019) or the infestation by the ectoparasitic Varroa mite destructor (Guzmán-Novoa et al., 2010;Martin et al., 2012).
From these three modeling approaches, we conclude that honey production is reduced due to the decrease in the total number of bees within the colony and due to an external stress factor, which could be climate variability, pesticides, parasites such as the Varroa mite, or habitat loss, among other things. This can be observed in Fig. 6, in the simulations of the non-smooth model (Eqs. 7 and 8), and mainly in the statistical results in Fig. 3. From numerical experiments it was observed that the amount of honey produced is considerably reduced in time; however, it does not fully collapse but stay constant.

Appendix A
Proof of Lema 3.1. Proof. It is clear that T T +ν ≤ 1. Adding the first two equations of the system (Eq. 5) we obtain Therefore, M ≤ ρ/α, which concludes the proof.

A1 Equilibrium points
To solve the algebraic system given on Eq. (7), we proceed as follows.
From the second equation in Eq. (7), Replacing B defined in Eq. (A1) in the first equation of the system (Eq. 7) we obtain From the previous equation we have T = 0 or T = 0. First, let us suppose that T = 0, thus where .
Replacing T defined on Eq. (A2) into the third equation of the system, Eq. (7), we have that Now, replacing T defined on Eq. (A2) into the Eq. (A1), we obtain that Note that a sufficient condition for T defined in Eq. (A2), M defined in Eq. (A4) and B defined in Eq. (A5) to be positive is that λ * > ν. Now, if T = 0 in Eq. (A2), we obtain the trivial equilibrium solution E 0 = (0, 0, 0).

A2 Local stability analysis
The stability of E 0 can be found by evaluating the above matrix in E 0 . Thus, An eigenvalue of the previous matrix is ξ = −α, whereas the other two eigenvalues are given by the roots of the following characteristic polynomial: From the Routh-Hurwitz criterion (DeJesus and Kaufman, 1987), we have that the roots of the above polynomial have negative real parts, if and only if the following conditions are met: It is clear that 1 > 0 and 2 > 0, if and only if λ * < ν.
Now, we will determine the stability conditions for the equilibrium E 1 . Evaluating matrix (Eq. 8) in E 1 we obtain An eigenvalue of the previous matrix is η = −α − δ(λ * − ν), which is negative if λ * ≥ ν, while the other two eigenvalues are given by the roots of the following characteristic polynomial: Similarly, from the Routh-Hurwitz criterion (DeJesus and Kaufman, 1987) the roots of the above polynomial have negative real parts, if and only if the following conditions are met: It is clear that 1 > 0 while 2 > 0, if and only if λ * > ν.

Appendix B
We assume that the state space consists of the three regions S 1 , S 2 and S 3 ⊂ R 3 , which are separated by two discontinuity surfaces i with i = 1, 2, defined as the null set of a smooth scalar function H (x) : R 3 → R, in such a way that the equations (Eq. 5) under the conditions (Eq. 9) are the following smooth functions: F 1 (B, T , M), F 2 (B, T , M) and F 3 (B, T , M) determined by the value of T according to Eq. (9). If T < R 1 , we have When R 1 ≤ T ≤ R 2 , we have that These three functions are also assumed to be defined through the state space, even though they are only used in their respective regions (Amador et al., 2013). Thus, the regions S 1 , S 2 and S 3 can be defined by the following sets: The regions are separated by a switching surface: limit of discontinuity represented by the scalar function As x = (B, T , M) ∈ R 3 , the gradient of H (i) (x) can be defined as follows: Thus, taking the regions i in which the gradient does not null in the set i = {x ∈ R 3 : H (i) (x) = 0, i = 1, 2}, for x ∈ 1 , we define and, for x ∈ 2 , Filippov's systems theory (Amador et al., 2017) states that at points x, where ϕ (i) (x) > 0, i = 1, 2, there is a crossing point, ϕ (i) (x) = 0, i = 1, 2, there are tangential points, ϕ (i) (x) < 0, i = 1, 2, there is a sliding point, with the equationẋ = g i (x), x ∈ i .
For x ∈ 1 , we have the following cases.
Data availability. The historical data of bee farming in the United States of America (USA) from 1985 to 2019, from the US Department of Agriculture (USDA), used in this paper are publicly available on the USDA web page (https://usda.library.cornell.edu/concern/publications/hd76s004z? locale=en&page=3#release-items, National Agricultural Statistics Service, 2021).
Author contributions. JPR performed the conceptualization, model formulation, formal analysis, supervision, validation, and writing of the original draft and developed the methodology and software. AG performed the investigation and data collection, developed the methodology, and participated in the review and editing. IFB performed the data curation, visualization, and statistical analysis; developed the R code; and participated in the review and editing. OM performed the non-smooth model analysis, developed the MATLAB code, and participated in the review and editing. AP performed the model formulation, carried out the funding and resource acquisition, participated in the project administration and in the review and editing.
Competing interests. The contact author has declared that neither they nor their co-authors have any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.