All events are in Central time unless specified.
Presentation

Ph.D. Dissertation Defense: Praval Sharma

Date:
Time:
1:00 pm – 3:00 pm
Avery Hall Room: 347
1144 T St
Lincoln NE 68508
Additional Info: AVH
Virtual Location: Zoom
Target Audiences:
Ph.D. Dissertation Defense: Praval Sharma
Monday, July 29, 2024
1:00 PM CST
347 Avery Hall
Zoom: https://unl.zoom.us/j/92320107889

“Event-Specific Spatial and 5Ws Information Extraction from Structured Documents”

Event-specific spatial and 5Ws information extraction aims to extract the key aspects of events, such as location, time, and participants. These aspects are critical for understanding and analyzing events. Structured documents, such as news reports, are primary sources of information on events and can be leveraged to extract their essential elements. Many approaches have been proposed for extracting important event details from structured documents. However, several challenging issues exist. First, existing approaches for event-specific spatial information extraction, specifically place name extraction, lack spatial awareness and are ineffective in extracting lesser-known but high spatial resolution place names for large parts of the world. Second, current geocoders, effective in resolving ambiguous but well-known toponyms, e.g., London and Paris, have limited efficacy in geocoding lesser-known non-gazetteer place names. Third, large, manually verified datasets for event 5Ws extraction are not readily available. Finally, most existing event extraction algorithms are based on the closed-domain event extraction paradigm and thus do not generalize to unrestricted event types. To overcome these challenges, we develop novel event-specific spatial and 5Ws information extraction approaches and datasets in this research. First, we have developed a novel place name extractor that uses a hybrid knowledge-driven and data-driven method to become spatially aware of a geographic context to effectively extract lesser-known places. Second, we have introduced an unsupervised data-driven algorithm that leverages spatial patterns and context of place names to geocode non-gazetteer place names. Third, we have created the largest dataset for open-domain event 5Ws through manual annotation and verified using statistical reliability measures. Finally, we have developed an open-domain event 5Ws extraction algorithm that employs an ensemble strategy using multiple large language models and contrastive learning. The experimental results show that our algorithms outperformed state-of-the-art place name recognizers and geocoders, and our ensemble strategy enhanced the accuracy in event 5Ws extraction. The broader impact of our research is the advancement of event-specific information extraction, which is necessary for improving event analysis and natural language understanding.

Committee:
Ashok Samal (Chair)
Leen-Kiat Soh (Co-Chair)
Stephen Scott (Reader)
Qiuming Yao (Reader)
Michael Hayes (Outside Representative)
Deepti Joshi (Special Member)

Download this event to my calendar