Estimating gypsum equirement under no-till based on machine learning technique 1

Chemical stratification occurs under no-till systems, including pH, considering that higher levels are formed from the soil surface towards the deeper layers. The subsoil acidity is a limiting factor of the yield. Gypsum has been suggested when subsoil acidity limits the crops root growth, i.e., when the calcium (Ca) level is low and/or the aluminum (Al) level is toxic in the subsoil layers. However, there are doubts about the more efficient methods to estimate the gypsum requirement. This study was carried out to develop numerical models to estimate the gypsum requirement in soils under no-till system by the use of Machine Learning techniques. Computational analyses of the dataset were made applying the M5’Rules algorithm, based on regression models. The dataset comprised of soil chemical properties collected from experiments under no-till that received gypsum rates on the soil surface, throughout eight years after the application, in Southern Brazil. The results showed that the numerical models generated by rule induction M5’Rules algorithm were positively useful contributing for estimate the gypsum requirements under no-till. The models showed that Ca saturation in the effective cation exchange capacity (ECEC) was a more important attribute than Al saturation to estimate gypsum requirement in no-till soils.


INTRODUCTION
No-till (NT) systems with diversified crop rotations have stood out as one of the most effective strategies to improve the sustainability of agriculture in tropical and subtropical regions (HOBBS;SAYRE;GUPTA, 2008).Long-term NT systems are known to cause chemical stratification, including pH, where high pH levels are formed in the upper few inches of the soil profile.Subsoil acidity is an important yield-limiting factor (CAIRES et al., 2008;DALLA NORA;AMADO, 2013;TANG et al., 2003).
Gypsum, a by-product of the phosphoric acid industry, mainly contains calcium sulfate and small amounts of P and F and is largely available in many parts of the world.In Brazil, approximately 4.8 Tg are produced each year (RAIJ, 2008).When applied to the soil surface, gypsum moves down the profile during drainage, resulting in increases in the Ca supply and a reduction in toxic levels of Al.As a result, better root growth, increased uptake of water and nutrients by plants roots and higher crop yields have been observed (BLUM; CAIRES; ALLEONI, 2013; DALLA NORA; AMADO, 2013;SORATTO;CRUSCIOL, 2008).
Gypsum application in Brazilian soils has been recommended when exchangeable Ca content is lower than 4 mmol c dm -3 (RAIJ et al., 1996;RIBEIRO;GUIMARÃES;ALVAREZ, 1999) or 5 mmol c dm -3 (SOUSA; LOBATO, 2002), exchangeable Al content is higher than 5 mmol c dm -3 (RIBEIRO; GUIMARÃES; ALVAREZ, 1999), and/or Al saturation is higher than 20% (SOUSA; LOBATO, 2002), 30% (RIBEIRO; GUIMARÃES; ALVAREZ, 1999), or 40% (RAIJ et al., 1996) in subsurface layers (20-40 or 30-50 cm).Although reasonable, these Ca and Al levels considered critical to the root growth were based on few studies, and little is known about the benefits of using gypsum in NT soils that have no such limitations in deep layers.In a recent study (CAIRES et al., 2011a), the use of gypsum showed economic viability to maximize crop grain production in a long-term NT soil with a sufficient level of exchangeable Ca ( 8 mmol c dm -3 ) and low levels of exchangeable Al ( 4 mmol c dm -3 ) and Al saturation ( 15%) in the subsoil layers (20-60 cm).
A consequence of gypsum application to soils is the displacement of other cation on the exchange complex.Consequently, exchangeable Mg and K leaching have often been observed in the studies with gypsum application (CAIRES et al., 2011a,b).The leaching of Mg after gypsum application can be beneficial for Ca and K plant nutrition and crop yield, when the soil has elevated Mg levels and a low Ca/Mg ratio in the most superficial layers (CAIRES;FELDHAUS;BLUM, 2001;CAIRES et al., 2004).
Considering the doubts still exist about the most appropriate methods for estimating the gypsum requirement (GR) (RAIJ, 2008), it is possible that such estimate can be made by techniques of Machine Learning (ML), that are able to learn about facts and data, and handle new situations using reasoning and generalization.
ML can be used for pattern recognition with different purposes, including for predicting continuous numeric values.Considering the positive results obtained from the use of ML techniques in agronomy (GUIMARÃES; CATANEO;ZAZUETA, 2007;MEIRA;ROGRIGUES;MORAES, 2008), we hypothesize that is possible to develop numerical models to estimate GR in NT soils by the use of the M5'Rules algorithm (HOLMES;HALL;FRANK, 1999), which is based on the ML concepts.

Sites Description
The study was performed in three field sites located in the Center-South region of Parana State, Brazil.The first site, here named site A, is located in Ponta Grossa (25°8´ S, 50°15´ W, and average altitude of 853 m), on a loamy Typic Hapludox (295 g kg -1 of clay) with high acidity.Prior to the establishment of the experiment, the field site had been used under NT cultivation during 15 years.Table 1 shows results of soil chemical analyses for different depths before the establishment of the experiment.Gypsum at the rates of 0, 4, 8, and 12 t ha -1 were broadcast on the soil surface in 1993 and a randomized complete block design was used, with three replications.Plot size was 50.4 m 2 (8.0 × 6.3 m).Samples of soil were taken at 0-10, 10-20, 20-40, and 40-60 cm depths at 8, 20, 32, 44, and 56 months after gypsum application.
The second site, named site B, also is located in Ponta Grossa (25°10´ S, 50°05´ W, and average altitude of 970 m), on a clayey Rhodic Hapludox (580 g kg -1 of clay) with medium acidity, previously used for pasture.Table 2 shows results of soil chemical analyses for different depths before the establishment of the experiment.Gypsum at the rates of 0, 3, 6, and 9 t ha -1 were broadcast on the soil surface in 1998 and a randomized complete block design was used, with three replications.Plot size was 56.0 m 2 (8.0 × 7.0 m).Samples of soil were taken at 0-10, 10-20, 20-40, and 40-60 cm depths at 8, 20, 32, 44, 56, 68, 80, and 92 months after gypsum application.
The third site, named site C, is located in Guarapuava (25°17´ S, 51°48´ W, and average altitude of 1080 m), on a clayey Typic Hapludox (650 g kg -1 of clay) with low acidity.Prior to the establishment of the experiment, the field site had been used under NT cultivation during 15 years.Table 3 shows results of soil chemical analyses for different depths before the establishment of the experiment.Gypsum at the rates of 0, 4, 8, and 12 t ha -1 were broadcast Estimating gypsum equirement under no-till based on machine learning technique  on the soil surface in 2005 and a randomized complete block design was used, with four replications.Plot size was 49.0 m 2 (7.0 × 7.0 m).Samples of soil were taken at 0-10, 10-20, 20-40, and 40-60 cm depths, at 8, 20 and 32 months after gypsum application.
According to Köppen-Geiger System (PEEL; FINLAYSON; MCMAHON, 2007), the climate of the three field sites is Cfb, with mild summer and frequent frosts during the winter.The average annual air temperature is of 17 °C, and the average air temperature of the coldest month is 13 °C and Alaine Margarete et al. of the hottest month is not higher than 22 °C.The average annual rainfall is 1600 mm, with rainfall concentrated between the September to April months.
The three field sites were cultivated under NT with crop rotation.Maize (Zea mays L.) or soybean (Glycine max L. Merril) were sown during the spring-summer seasons, and black oat (Avena strigosa Schreb.) or wheat (Triticum aestivum L.) or barley (Hordeum vulgare L.) in autumn-winter season.According to the crops requirements and nutrients levels in the soil fertilizers were applied, as recommended for the Paraná State.More details about the experimental sites and the effects of gypsum application on crop grain yields are reported in Caires et al. (1999);Caires;Feldhaus;Blum, (2001); Caires et al. (2002Caires et al. ( , 2004Caires et al. ( , 2011b)).
For the soil samples taken in the three experimental sites, the exchangeable Al, Ca, Mg and K contents were determined according to the standard methods adopted by the Agronomic Institute of Parana (PAVAN et al., 1992).The effective cation exchange capacity (ECEC) was calculated by summing the exchangeable cations (Al + Ca + Mg + K) and the Al, Ca, Mg and K saturation in

Data Analysis Method
The data analysis was made using ML techniques, and the M5'Rules algorithm implemented in the software Weka (WITTEN; FRANK; HALL, 2011) was adopted.This model tree induction algorithm for predicting numeric values presents a better performance when compared to others (DUGGAL; SINGH, 2012) and is based on regression models.Therefore, each leaf node of the tree structure has a regression model called rule.In this study, each regression model (rule) estimated the GR considering the Al, Ca, Mg and K saturation.The selection of the best rules of GR estimative, in t ha -1 , was established considering the correlation coefficient, relative absolute error, root relative squared error and total of instances covered by the rule, that represent the confidence level of the rule.In this way, the use of M5'Rules algorithm is considered an effective alternative to data analysis as soon as it uses strong statistics concepts and well known ML techniques.

Site A
The analysis of all instances in the five sampling time (8, 20, 32, 44, and 56 months after gypsum application), during the period from 1994 to 1998, did not generate efficient rules.When only the Ca saturation (CaSat) was considered in all soil profiles (0-60 cm) during the same evaluation period, it was observed that the GR estimative was affected by the sampling time and the Ca saturation (r = 0.55) at 20-40 and 40-60 cm depths (Table 4, R1).GR estimative was not better when the Al saturation (m) was considered at 20-40 cm (r = 0.46) (Table 4, R2) and 40-60 cm (r = 0.36) (Table 4, R3) depths, and the same occurred when the CaSat and the m in all of the studied depths were considered (r = 0.43) (Table 4, R4).Even with the time interference after gypsum addition to estimate GR, these results showed that the Ca saturation in the subsoil (20-40 and 40-60 cm) was a more adequate attribute to estimate GR than Al saturation, alone or in combination with the Ca saturation.
With increasing gypsum rates leaching of Mg and K exchangeable in the soil can occur (CAIRES et al., 2011a,b).So, it would be expected a negative correlation between GR and MgSat and KSat values, mainly in the soil surface layers.In fact, the analysis performed during the period from 2004 to 2008 revealed that MgSat at 0-10 and 10-20 cm depths was negatively correlated (r = 0.42) with GR (Table 4, R5).For KSat the correlation obtained was much lower (r = 0.30) (Table 4, R6), showing greater influence of the gypsum addition in leaching of exchangeable Mg than exchangeable K, which is in agreement with the results obtained in other studies with gypsum (CAIRES et al., 2011a,b;ZAMBROSI;ALLEONI;CAIRES, 2007).
Because the sampling time affects the estimated results due to the long-term evaluation, the instances analysis was performed only for instances of the first three sampling time (8, 20, and 32 months), in all soil profiles (0-60 cm).For the CaSat, a rule generated for estimating GR (Table 4, R7) presented a little better correlation (r = 0.57) related to all the evaluation period.For the Al saturation values, it was observed that the rule generated for estimating GR showed a better correlation (r = 0.59) having proved influence of the m values at 40-60 cm depth (Table 4, R8).The correlations to the rules generated as a function of MgSat (r = 0.38) (Table 4, R9) and KSat (r = 0.26) (Table 4, R10) were not better than those observed throughout the evaluation period.
The higher maize grain yield in this soil was obtained after gypsum application at 9.5 t ha -1 (CAIRES et al., 1999), in the second evaluation time (20 months).Considering the best rules generated, the GR of 9.5 t ha -1 it would be obtained for Ca saturation in the ECEC of 64% at 20-40 and 40-60 cm depths (Table 4, R1 and R7) and for Al saturation of 6.5% at the 40-60 cm depth (Table 4, R8).Thus, the maximum maize grain yield should have been achieved for Ca saturation of 64% or Al saturation of 6.5% in the subsoil.It is noteworthy that the use of gypsum showed economic viability to maximize crop grain yield in a NT soil presenting low levels of Al saturation ( 15%) in the subsurface layers (20-60 cm) (CAIRES et al., 2011a), in agreement with the results obtained in this study.Since the generated rules admit other possibilities of Ca saturation at 20-40 and 40-60 cm layers for obtaining GR resulting in the maximum maize yield (9.5 t ha -1 ), they could serve as alternatives to estimate GR.

Site B
The analysis of all instances in the eight sampling time (8,20,32,44,56,68,80, and 92 months after gypsum application), during the period from 1999 to 2006 did not generate efficient rules, similar to that happened in the site A. Becoming alone the Ca saturation (CaSat) and considering all the soil profiles (0-60 cm), during the same evaluation period, it was found that GR was influenced by the sampling time and the Ca saturation (r = 0.58) at 0-10 and 40-60 cm depths (Table 5, R11).
Estimating GR taking in account the Al saturation (m) in the soil profiles was worst, but the correlation to the generated rule was a little more close (r = 0.60) when considering the CaSat and m at the studied depths Estimating gypsum equirement under no-till based on machine learning technique layers, there was no change of the rule generated for the Ca saturation when Al saturation was included in the analysis.
The GR estimative was correlated (r = 0.69) with the sampling time and the Mg saturation at 0-10 and 10-20 cm depths (Table 6, R22), during the period from 2006 to 2008.According to the rule, higher GR would correspond to lower Mg saturation values at 0-10 and 10-20 cm depths.This effect results of leaching of exchangeable Mg that occurs in soil after gypsum addition (CAIRES et al., 2011a,b).The influence of the sampling time, in this case, shows the leaching of exchangeable Mg after gypsum application continued occurring during the evaluated period, further reducing the Mg saturation at 0-10 and 10-20 cm depths.
Related to the K saturation, the generated rule for the GR estimative (Table 6, R23) showed to be valid (r = 0.60) only for the case of the K saturation 4.15% at the 20-40 cm depth.In any case, the rule indicates that higher GR correspond to smaller values of K saturation, particularly in the soil surface layer (0-10 cm), which would be related to the leaching of exchangeable K in the soil due to the application of gypsum (CAIRES et al., 2011b).
The higher maize grain yield in this soil was obtained after gypsum application at 7.8 t ha -1 (CAIRES et al., 2011b), in the first evaluation time.Considering the generated rule based on the Ca saturation during the period from 2006 to 2008 (Table 6, R21), it is noted the GR of 7.8 t ha -1 would be obtained for Ca saturation in the ECEC about 80% at the 0-10 cm depth, and 70% at the 10-20 cm depth.The rule admits to other possibilities of Ca saturation at 0-10 and 10-20 cm depths to obtain the  GR that resulted in the higher maize grain yield (7.8 t ha -1 ), but it is clear the Ca saturation in the surface layers is an important component for GR estimative in soil that does not present problems related to Al toxicity in the profile.In other study performed under a NT it was verified the economic viability of using gypsum to maximize production of grain crops in a soil with Ca saturation in the ECEC values ranging from 59 to 67%, at the 0-10 cm depth, and from 47 to 57%, at the 10-20 cm depth (CAIRES et al., 2011a), which agrees with the results obtained in this study.

Sites A, B, and C
The joint analysis of all instances in the three sites (A, B, and C), considering the first three times (8, 20, and 32 months after gypsum application) and the four sampling depths (0-10, 10-20, 20-40, and 40-60 cm), showed the generated rules based on the Ca saturation in the ECEC and the Al saturation (m) were the ones that showed the closest correlation with the GR estimative (Table 7).Even considering all depths of the soil profile (0-60 cm), the rule generated for the GR estimative showed a strong correlation (r = 0.70) with the Ca saturation only at the 0-10 cm depth (Table 7, R24).
The generated rule considering only Al saturation (m) was related to the m value at the 40-60 cm depth (Table 7, R25), but the correlation was very weak (r = 0.26).When the possibility of combining the Ca saturation and Al saturation (m) was considered, the generated rule presented strong correlation (r = 0.78) with m at the 0-10 cm depth and Ca saturation at 0-10 and 10-20 cm depths (Table 7, R26).Certainly because the higher soil acidity of the sites A and B compared to the site C the rule presented relation with m in the soil surface layer (0-10 cm).Because the problems related to the presence of Al in the soil surface layer are easily corrected by applying lime, it is evident the importance of the Ca saturation in the soil surface layers to estimate GR when the soil acidity is corrected by liming.
For the three studied sites the higher grain yields of maize, wheat or barley occurred with GR ranging from 7.8 to 9.5 t ha -1 (CAIRES et al., 1999;CAIRES;FELDHAUS;BLUM, 2001;CAIRES et al., 2002CAIRES et al., , 2004CAIRES et al., , 2011b)).Since the GR was strongly correlated with the Ca saturation in the ECEC at the 0-10 cm layer (Table 7, R24), it is possible that the higher crop grain yields under NT systems would be obtained with Ca saturation in the ECEC values in the soil surface layer (0-10 cm) ranging from 78 to 84%.

CONCLUSIONS
1.The regression models generated by rule induction M5'Rules algorithm were positively useful contributing for determining gypsum requirement to be used in NT soils.
2. The models showed that Ca saturation in the ECEC was a more important attribute than Al saturation to estimate gypsum requirement in NT soils.

Table 1 -
Results of soil chemical analyses for different depths before the establishment of the experiment at site A † Base saturation = 100 (Ca + Mg + K/CEC pH 7.0).‡ Al saturation = 100 (Al/ECEC)

Table 2 -
Results of soil chemical analyses for different depths before the establishment of the experiment at site B †Base saturation = 100 (Ca + Mg + K/CEC pH 7.0).‡ Al saturation = 100 (Al/ECEC)

Table 3 -
Results of soil chemical analyses for different depths before the establishment of the experiment at site C † Base saturation = 100 (Ca + Mg + K/CEC pH 7.0).‡ Al saturation = 100 (Al/ECEC)

Table 6 -
Rules generated to estimate gypsum rate based on site C dataset

Table 7 -
Rules generated to estimate gypsum rate based on sites A, B, and C dataset Alaine Margarete et al.