The technique of discovering free-form mathematical models from observed data using Genetic Programming (GP) is called Symbolic Regression (GP-SR). Despite being the most popular topic within the GP community and with a number of user-friendly implementations already available, SR has not yet been accepted as standard data science tool. The reluctance is in part due to the hard to analyze random nature of GP and scalability issues. On the other hand, most popular deterministic regression algorithms were designed to generate a limited set of functional forms and therefore lack the flexibility of GP based SR (GP-SR).

In this ongoing work, our long-term hypothesis is that, hybridizing these two approaches will outperform both approaches alone. Currently, we showed that using the deterministic approach as the "feature extractor/selector" for the GP based symbolic regression approach improved the performance of GP-SR.


[1] Icke, I., and Bongard, J., Improving Symbolic Regression using Deterministic Machine Learning, IEEE CEC 2013, Cancun, Mexico pdf
[2] Ilknur Icke, Nicholas A. Allgaier, Christopher M. Danforth, Robert A. Whelan, Hugh P. Garavan, Joshua C. Bongard, and IMAGEN Consortium, A Deterministic and Symbolic Regression Hybrid Applied to Resting-State fMRI data, Genetic Programming Theory and Practice 2013 link