Statistics Weekly Seminar - Katherine Goode
FORESTR: Searching for patterns in random forests
3:00 pm –
4:00 pm
Hardin Hall - North Wing, HARH 49
Room: HARH 49
3310 Holdrege Street
Lincoln NE 68583-0963
Lincoln NE 68583-0963
Virtual Location:
Zoom
Target Audiences:
Contact:
Department of Statistics, statistics@unl.edu
Abstract:
Random forests have become a popular tool for data driven predictions and, as a result, are used, or considered for use, in national security mission applications. While individual regression/decision trees are typically considered interpretable, random forests are inherently difficult to interpret due to their ensemble of trees. A lack of model transparency may be less than desirable in high-consequence applications. We aim to increase the interpretability of random forests by finding patterns in the ensemble of trees. As a starting point, we develop a new distance metric for quantifying the similarity between trees based on their topologies (i.e., shapes). We base the metric on a novel distance metric for graphs that is a proper mathematical distance, is invariant to transformations, has registration between graphs, and computes topological evolutions between graphs. The tree distance metric enables computations of tree statistics (e.g., a “mean” tree) and identification of tree clusters. We apply the developed methodology to a toy dataset and a mission relevant product inspection dataset, which demonstrates how the metric provides insight into random forests. Furthermore, we discuss limitations of the approach and ideas for future research.
Bio:
Katherine Goode is a statistician at Sandia National Laboratories, specializing in the development and evaluation of explainable machine learning. Her current research focuses on the responsible use of AI in national security applications, enhancing human-AI interactions, and extreme weather forecasting using machine learning. Katherine holds a PhD in statistics from Iowa State University, an MS in statistics from the University of Wisconsin-Madison, and a BA in mathematics from Lawrence University.
Random forests have become a popular tool for data driven predictions and, as a result, are used, or considered for use, in national security mission applications. While individual regression/decision trees are typically considered interpretable, random forests are inherently difficult to interpret due to their ensemble of trees. A lack of model transparency may be less than desirable in high-consequence applications. We aim to increase the interpretability of random forests by finding patterns in the ensemble of trees. As a starting point, we develop a new distance metric for quantifying the similarity between trees based on their topologies (i.e., shapes). We base the metric on a novel distance metric for graphs that is a proper mathematical distance, is invariant to transformations, has registration between graphs, and computes topological evolutions between graphs. The tree distance metric enables computations of tree statistics (e.g., a “mean” tree) and identification of tree clusters. We apply the developed methodology to a toy dataset and a mission relevant product inspection dataset, which demonstrates how the metric provides insight into random forests. Furthermore, we discuss limitations of the approach and ideas for future research.
Bio:
Katherine Goode is a statistician at Sandia National Laboratories, specializing in the development and evaluation of explainable machine learning. Her current research focuses on the responsible use of AI in national security applications, enhancing human-AI interactions, and extreme weather forecasting using machine learning. Katherine holds a PhD in statistics from Iowa State University, an MS in statistics from the University of Wisconsin-Madison, and a BA in mathematics from Lawrence University.
Download this event to my calendar
This event originated in Statistics Seminar.