中文版 | English

Automatic Extraction of Historical Figures and Events

Basic information
Project identifier AS-ASCDC-111-202
Conducted by Institution of Information Science
Director
Overview

In the past five years (2017-2021), through the cooperation of the Institute of Information Science, the Institute of History and Philology, and the Institute of Linguistics, we have created a Chinese-language markup and information retrieval platform that provides humanities scholars with a digital environment to process high-quality value-added big data to conduct various analyses and research. This platform features multiple functions. For example, it can automatically identify proper nouns in Chinese texts and automatically link the relationship between names of people, places, official positions, and organizations, thus identifying who held what official position, from which has even been developed an automated official career analysis application. As far as we know, this platform is the first of its kind in digital humanities research. 

Starting this year (2022), we have further focused on automated performance analysis, hoping to automatically analyze from a text, concerning a person who has become an official, what that person did, what events occurred around them, and where they have stayed, and even extracting fully what a certain person did in a certain place after attaining a certain official position. Such description and characterization of historical persons must be detailed, so as to provide large-scale material for humanities scholars to conduct various analyses. Among the main challenges are that there is no pre-existing data marked up with political achievement categories, the authority files for events are lacking, and few scholars have made a systematic definition or labeling of "political achievements" or "events." Thus there is no suitable training corpus to train the automatic recognition model.  

For such problems, a direct solution is large-scale manual markup, but this kind of work is time-consuming, labor-intensive, and impractical. Our goal is to achieve automated event identification and political performance analysis with little or no human markup. The recent transfer learning in AI has inspired us: the training corpus for semantic analysis of Modern Chinese is rich, and although there is no political achievement classification labeling, there is rich grammatical and semantic markup. Is it possible to use the semantic analysis model for Modern Chinese, through automatic transfer, so that it can be applied to the environment of Classical Chinese to predict the classification of political achievements or events? 

Since 2022, our partner, "The Institute of History and Philology's Project to Digitally Innovate Academic Settings," has focused on the study of local chronicles. In addition to official information, local chronicles also have chapters on official achievements, which can be used as research materials. So on the one hand, we continue to assist the Institute of History and Philology in the automated extraction of authority files for names of people and officials, while on the other hand, we also cooperate to carry out automated event identification and political performance analysis research. As in previous years, we will also integrate the research results into ASCDC's research platform, the Digital Analysis System for Humanities. 

Find out more

Automated Links Between Names of People and Officials in Ancient Chinese Corpora

Chinese Named Entity Recognition System with Confidence Scores

Official Career Analysis System

 

Online Ontological Framework for

Old Chinese

Middle Chinese

Early Mandarin

Modern Chinese

 

Academia Sinica Tagged Corpus of

Old Chinese

Middle Chinese

Early Mandarin Chinese

Modern Chinese

Back to Project List