Monthly Stream Flow Prediction Using Support Vector Machine Based on Principal Component Analysis

Document Type : Technical Note


1 Member of Water Research Institute, Ministry of Energy, Ph.D Student of Environmental Eng., Graduate Faculty of Environment, University of Tehran

2 Management Director of CELCO , and Ph.D Student of Environmental Eng., Graduate Faculty of Environment, University of Tehran

3 Ph.D Student of Hydraulic Eng., Dept. of Eng., Islamic Azad University, Science and Research Branch, Tehran

4 Member of Water Research Institute, Ministry of Energy, Ph.D. Student of Hydraulic Structure, College of Agriculture, Tarbiyat Modarres University, Tehran


The main goal of this research is to evaluate the role of input selection by Principal Component Analysis (PCA) on Support Vector Machine (SVM) performance for monthly stream flow prediction. For this purpose, SVM is used to predict monthly flow as a function of 18 input variables. PCA is subsequently employed to reduce the number of input variables from 18 to 5 PCs which are finally fed into the SVM model. SVM and PCA-SVM models are evaluated in terms of their performance using a developed statistic by the authors. Findings show that preprocessing of input variables by PCA improved SVM performance.


1- Kuligowski, R., and Barros, A. P. (1998). “Localized precipitation forecasts from a numerical weather prediction model using artificial neural networks.” Weather and Forecasting, 13 (40), 1195-1205.
2- Adeloye, A. J., and Munari, A. D. (2006). “Artificial neural network based generalized storage–yield–reliability models using the Levenberg–Marquardt algorithm.” J. of Hydrology, 362 (1-4), 215-230.
3- Zhao, R. J. (1992). “The Xinanjiang model applied in China.” J. of Hydrology, 135 (1-4), 371-381.
4- Jain, A., and Srinivasulu, S. (2006). “Integrated approach to model decomposed flow hydrograph using artificial neural network and conceptual techniques.” J. of Hydrology, 317 (3-4), 291-306.
5- Xiong, L. H., Shamseldin, A. Y., and O’Connor, K. M. (2001). “A nonlinear combination of the forecasts of rainfall-runoff models by the first order Takagi-Sugeno fuzzy system.” J. of Hydrology, 245 (1-4), 196-217.
6- Muller-Wohlfeil, D.I., Xu, C. Y., and Iversen, H. L. (2003). “Estimation of monthly river discharge from Danish catchments.” Nordic Hydrology, 34 (4), 295-320.
7- Asefa, T., Kemblowski, M., McKee, M., and Khalil, A. (2005). “Multi-time scale stream flow predictions: The support vector machines approach.” J. of Hydrology, 318 (1-4), 7-16.
8- Vapnik, V. N. (1995). The nature of statistical learning theory, 1st Ed.,Springer-Verlag,New York.
9- Yu, P. S., Chen, S. T., and Chang, I.F. (2006). “Support vector regression for real-time flood stage forecasting.” J. of Hydrology, 328 (3-4), 704-716.
10- Yu, X., and Liong, S. Y. (2006). “Forecasting of hydrologic time series with ridge regression in feature space.” J. of Hydrology, 332 (3-4), 290-302.
11- Zhang, Y. X. (2007). “Artificial neural networks based on principal component analysis input selection for clinical pattern recognition analysis.” Talanta, 73 (1), 68-75.
12- Zhang, Y., Li, H., Hou, A., and Havel, J. (2006). “Artificial neural networks based on principal component analysis input selection for quantification in overlapped capillary electrophoresis peaks.” Chemometrics and Intelligent Laboratory Systems, 82 (1-2), 165-175.
13- Noori, R., Farrokhnia, A., Morid, S., and Riyahi-Madvar, H. (2008). “Effect of input variables preprocessing in artificial network on monthly flow prediction by PCA and wavelet transformation.” J. of Water and Wastewater, 69, 13-22. (in Persian)
14- Noori, R., Abdoli, M. A., Ameri A., and Jalili-Ghazizade, M. (2008). “Prediction of municipal solid waste generation with combination of support vector machine and principal component analysis: A case study of Mashhad.” Environmental Progress and Sustainable Energy, 28 (2), 249-258.
15- Camdevyren, H., Demyr, N., Kanik, A., and Keskyn, S. (2005). “Use of principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs.” Ecol. Model, 181 (4),
16- Lu, W. Z., Wang, W. J., Wang, X. K., Xu, Z. B., and Leung, A. Y. T. (2003). “Using improved neural network to analyze RSP, NOX and NO2 levels in urban air in Mong Kok, Hong Kong.” Environmental Monitoring and Assessment, 87 (3), 235-254.
17- Noori, R., Kerachian, R., Khodadadi, A., and Shakibinia, A. (2007). “Assessment of importance of water quality monitoring stations using principal component and factor analysis: A case study of the karoon river.” J. of Water and Wastewater, 63, 60-69. (In Persian)
18- Manly, B. F. J. (1986). Multivariate statistical methods: A primer, 2nd Ed., Chapman and Hall,London.
19- Tabachnick, B. G., and Fidell, L. S. (2001). Using multivariate statistics, 3rd Ed., Allyn and Bacon,Boston,London.
20- Ouyang, Y. (2005). “Evaluation of river water quality monitoring stations by principal component analysis.” Water Research, 39 (12), 2621-2635.
21- Noori, R., Ashrafi, Kh., and Ajdarpour, A. (2008). “Comparison of ANN and PCA based multivariate linear regression applied to predict the daily average concentration of Co: A case study of Tehran.” J. of Physics Earth Space, 34 (1), 135-152.
22- Vapnik, V. N. (1998). Statistical learning theory, 1st Ed., Wiley,New York.
23- Cristianini, N., and Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods, 1st Ed., Cambridge University Press,Cambridge.
24- Chen, S. T., and Yu, P. S. (2007). “Real-time probabilistic forecasting of flood stages.” J. of Hydrology, 340 (1-2), 63-77.
25- Noori, R., Karbassi, A., Farokhnia, A., and Dehghani, M. (2009). “Predicting the longitudinal dispersion coefficient using support vector machine and adaptive neuro-fuzzy inference system techniques.” Environmental Engineering Science , 26 (10), 1503-1510.
26- Dibike, Y. B., Velickov, S., Solomatine, D. P., and Abbott, M. B. (2001). “Model induction with support vector machines: Introduction and applications.” J. of Computing in Civil Eng., 15 (3), 208-216.
27- Hsu, C. W., Chang, C. C., and Lin, C. J. (2003). “A practical guide to support vector classification.” <>, (4 Mar. 2009)
28- Jain, A., and Indurthy, S. K. V. P. (2003). “Comparative analysis of event based rainfall-runoff modeling techniques-deterministic, statistical, and artificial neural network.” J. of Hydrologic Engineering, 8(2), 93-98.
29- Noori, R., Hoshiyaripour, G. A., Ashrafi, K., and Araabi, B. N. (2010). “Uncertainty analysis of developed ANN and ANFIS models in prediction of carbon monoxide daily concentration.” Atomospheric Environment,
44 (4), 476-482.
30- Noori, R., Khakpour, A., Omidvar, B., and Farokhnia, A. (2010). “Comparison of ANN and principal component analysis multivariate linear regression models for predicting the river flow based on developed discrepancy ratio statiste.” Expert Systems with Applications, 37 (8), 5856-5862.
31- Noori, R., Karbassi, A. R., Moghaddamnia, A., Han, D., Zokaei-Ashtiani, M. H., Forokhnial, A., and Ghafari-Goushesh, M. (2011). “Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction.” J. of Hydrology, (In Press).