Presentation
Time:
Ph.D. Dissertation Defense: Salomé Perez-Rosero
Date:
9:00 am –
11:00 am
Avery Hall
Room: 103C
1144 T St
Lincoln NE 68508
Lincoln NE 68508
Additional Info: AVH
Virtual Location:
Zoom
Target Audiences:
“Mining Work Items to Streamline Software Maintenance Tasks”
Software engineering maintenance tasks often require associating code changes into groupings of related units of work to have as much information as possible about the developments toward addressing a specific code task. A comprehensive understanding of how a code task has evolved helps developers make better decisions about changes in the overall codebase, where a commit represents the set of code changes made to the codebase at a specific time. While the concept of work items as logically related code changes has been primarily theoretical, its impact on software maintenance tasks, such as tracing the origins of bugs or fixes spanning multiple commits while scanning through real-world software repositories’ commit histories, remains unexplored. This thesis introduces heuristic-based algorithms to mine work items from commit histories in open-source repositories across different scenarios. First, when issue tags are available throughout the commit history, we developed our first heuristic that mines associations out of validated issue tags from issue tracker systems such as Jira and GitHub. We generated a dataset of approximately 130,000 work items across repositories written in Java, Kotlin, and Python, with each work item group having a numerical confidence score for relatedness. Second, in scenarios where a reference commit is known and the goal is to generate work items associated with it, we developed our second heuristic that implements a method-level tracking mechanism. This approach scans the repository’s commit history backward, identifying overlapping code modifications linked to the reference commit to generate related work items. Third, when an automated and fast way for identifying work items is needed, we explore using pre-trained LLMs with prompts containing different levels of detail from commit diffs and logs to classify commit pairs as related or unrelated work items. Alongside this, we generate two work item datasets with labeled ground truth for fine-tuning purposes. Finally, we apply our top-performing work item heuristic to a software maintenance task in the context of the SZZ algorithms, which aim to track a bug’s introducing commit for a given fix commit. Specifically, we built a new SZZ variant that integrates work item awareness, which generated the first empirical evidence that bugs and fixes constitute work items; and reported a 4-18% improvement in bug-introducing commit identification over traditional SZZ algorithms.
Committee:
Dr. Robert Dyer and Dr. Witty Srisa-an, Advisors
Dr. Bonita Sharif
Dr. Lisong Xu
Dr. Qiuming Yao
Dr. Yi Qian
Software engineering maintenance tasks often require associating code changes into groupings of related units of work to have as much information as possible about the developments toward addressing a specific code task. A comprehensive understanding of how a code task has evolved helps developers make better decisions about changes in the overall codebase, where a commit represents the set of code changes made to the codebase at a specific time. While the concept of work items as logically related code changes has been primarily theoretical, its impact on software maintenance tasks, such as tracing the origins of bugs or fixes spanning multiple commits while scanning through real-world software repositories’ commit histories, remains unexplored. This thesis introduces heuristic-based algorithms to mine work items from commit histories in open-source repositories across different scenarios. First, when issue tags are available throughout the commit history, we developed our first heuristic that mines associations out of validated issue tags from issue tracker systems such as Jira and GitHub. We generated a dataset of approximately 130,000 work items across repositories written in Java, Kotlin, and Python, with each work item group having a numerical confidence score for relatedness. Second, in scenarios where a reference commit is known and the goal is to generate work items associated with it, we developed our second heuristic that implements a method-level tracking mechanism. This approach scans the repository’s commit history backward, identifying overlapping code modifications linked to the reference commit to generate related work items. Third, when an automated and fast way for identifying work items is needed, we explore using pre-trained LLMs with prompts containing different levels of detail from commit diffs and logs to classify commit pairs as related or unrelated work items. Alongside this, we generate two work item datasets with labeled ground truth for fine-tuning purposes. Finally, we apply our top-performing work item heuristic to a software maintenance task in the context of the SZZ algorithms, which aim to track a bug’s introducing commit for a given fix commit. Specifically, we built a new SZZ variant that integrates work item awareness, which generated the first empirical evidence that bugs and fixes constitute work items; and reported a 4-18% improvement in bug-introducing commit identification over traditional SZZ algorithms.
Committee:
Dr. Robert Dyer and Dr. Witty Srisa-an, Advisors
Dr. Bonita Sharif
Dr. Lisong Xu
Dr. Qiuming Yao
Dr. Yi Qian