Evaluating the performance of variable selection methods in modeling the site productivity of Oriental Beech (Fagus Orientalis Lipsky)

Document Type : Research Paper

Authors

1 M.Sc. Student in Forestry, Faculty of Natural Resources, Tarbiat Modares University, Nour, I.R. Iran

2 Assist. Prof., Faculty of Natural Resources, Tarbiat Modares University, Nour, I.R. Iran.

3 Prof., Faculty of Natural Resources, Tarbiat Modares University, Nour, I.R. Iran

Abstract

Forest site productivity is an important criterion for forest managers. In this study, the predictability of statistical models was studied by applying various methods of variable selection for beech dominant height as an indicator of forest site quality in relation to edaphic and physiographic factors. For this purpose, 127 0.1 ha circular sample plots were established in the forests of Tarbiat Modares University and within each plot, the dominant height of beech trees was calculated. The performance of five variable selection methods was evaluated in multiple linear regression, and regression trees. In order to compare the performance of the above methods, Cross-validation, involving repeated splits of the dataset into training and validation subsets (2500 times) was used to obtain honest estimates of predictive ability. The results showed that there is little differences in the predictive ability of five methods based on multiple linear regression. Stepwise methods performed similarly to exhaustive algorithms for subset selection, and the choice of criterion for comparing models (Akaike’s information criterion, Schwarz’s Bayesian information criterion or F statistics) had little effect on predictive ability. In this study method based on regression trees yielded with substantially lower predictive ability. It is concluded that there is no best method of variable selection and that any of the regression-based approaches discussed here is capable of yielding useful predictive models.

Keywords