Key Points:
- Factor selection is crucial when building financial models.
- Machine learning (ML) and data science can assist in factor discovery and creation.
- Factors should be economically meaningful and distinct to avoid multicollinearity.
- ML models do not require the same strict assumptions as traditional regression models.
- Selecting economically intuitive factors can enhance ML model outputs and provide greater clarity.
Factor selection, also known as “feature selection,” is a critical aspect of building financial models using machine learning (ML) and data science. These models help explain the behavior of target variables and identify the primary drivers of portfolio performance.
Traditionally, ordinary least squares (OLS) regression has been used as a simple method for constructing factor models. In this approach, the portfolio return is the dependent variable, and the risk factors are the independent variables. As long as the independent variables have low correlation, different models can be statistically valid and explain various degrees of portfolio behavior. The beta coefficient attached to each factor indicates how sensitive the portfolio’s return is to the behavior of that factor.
In comparison, ML regression models can better account for non-linear behavior and interaction effects. However, they do not provide direct analogs of OLS regression output, such as beta coefficients. ML models do not require the inversion of a covariance matrix or rely on strict parametric assumptions, homoskedasticity, or other time series assumptions.
When selecting factors for ML models, it is essential to ensure that they are economically meaningful, distinct, and avoid multicollinearity. In traditional regression models, multicollinearity occurs when explanatory factors are highly correlated, leading to unreliable results. However, ML models do not face the same multicollinearity issues, making distinct and economically intuitive factors necessary for practical results.
The least absolute shrinkage and selection operator (LASSO) technique can be employed in the pre-model stage to select distinct and economically intuitive factors. This technique allows model builders to distill a large set of factors into a smaller set while maintaining considerable explanatory power and maximum independence among the factors.
Using economically meaningful factors in ML-driven models brings several benefits. These factors have decades of research and empirical validation supporting their relevance. Researchers have studied them extensively in traditional regression models and other frameworks. By incorporating these factors into ML models, they can better explain asset returns and contribute to successful investment trading models.
Furthermore, using economically meaningful factors helps in understanding ML model outputs. ML models provide relative feature importance values that describe the explanatory power of each factor relative to others. These values are easier to interpret when the economic relationships among the factors are clear.
In conclusion, while ML models offer flexibility and the ability to handle various inputs, selecting economically meaningful factors is essential. This ensures that ML-driven investment frameworks remain understandable and that the models driving the investment process are complete and instructive.