# Automatic Extraction of Hierarchy in Multivariate Data (2013-)

# Ilknur Icke Home

## Motivation

Hierarchy is everywhere. How can we model it using symbolic regression?Given a hierarchical system with N state variables (v1,v2, ..., vN) and K inputs (s1, s2, ..., sK)

figure 1

symbolic regression will return a set of flat expressions:

v3 = v2 * (s1 - s2)

v2 = s3 * s4

v1 = s1 - s2

The more complex the system is, the less intelligible the models will be. Here, we proposed to build functional dependency graphs based on symbolic regression.

## From Data to Hierarchy: the Naive Approach

figure 2

## The Hierarchical Approach

Instead of the naive approach, we pursue a bottom-up approach. At the beginning, only the stimuli (input) variables are included in the set of independent variables. Until all state variables are modeled:- at each epoch, the state variable that is best explained by a subset of the current independent variables is selected
- these independent variables are removed from the set of independent variables and the selected state variable is added instead
- the identifed functional relationship is marked on the adjacency matrix

## Our hypotheses

- the naive approach does not guarantee discovery of hierarchy
- low (mean) prediction error does not necessarily mean that the correct topology is captured
- if you want to capture hierarchy, you need to explicitly seek hierarchy

figure 4

## Experimental Results

The results are reported in terms of three measures:- The percentage of correct edges recovered (1)
- Error on testing dataset (2)
- Percentage of correct edges recovered versus error on testing dataset (3)

figure 5

## Conclusions

- for binary systems, the hierarchical approaches continue to outperform the naive approach as the layers of hierarchy in the hidden target system increases
- for three-layer hierarchical hidden target systems, the hierarchical approach outperforms the naive approach until arity=5
- the diffculty of scaling (all three approaches) to hierarchical systems with higher arity is due to the inherent issue of feature selection in symbolic regression (being studied separately)

## Publications

[1] Icke, I., and Bongard, J., Automatic identification of hierarchy in multivariate data, GECCO 2013, Amsterdam, Netherlands poster (pdf)[2] Icke, I., and Bongard, J., Modeling Hierarchy Using Symbolic Regression, IEEE CEC 2013, Cancun, Mexico