中文版 | English

Automatic Extraction of Historical Figures and Events (2) – Automatic Construction of Historical Figure Event Graph

Basic information
Project identifier AS-ASCDC-112-202
Conducted by Institution of Information Science
Director
Overview

From 2017 to 2021, through the cooperation of the Institute of Information Science, the Institute of History and Philology (aka. IHP), and the Institute of Linguistics, we have created a Chinese-language marking and information retrieval system to provide humanities scholars with a large-data environment. The data is high-quality value-added, which is convenient for humanities scholars to conduct various analyses and research. Such a Chinese-language marking and information retrieval system includes multiple functions. For example, it can automatically identify proper nouns in Chinese texts and automatically link the relationship between these names, place names, official names, and organization names, such as knowing who held what official position, and even developed an automated official career analysis system based on this. As far as we know, this is also a pioneering work in digital humanities research.

Starting from 2022, we proposed a three-year project (2022-2024), which further focuses on automated performance analysis, hoping to automatically analyze the text of what someone has done after becoming an official, what events have happened, and where they have stayed. And even comprehensively know someone has done something in a certain place after becoming a certain official. Such description and characterization of the character must be more detailed, as it can provide a large amount of material for humanities scholars to conduct various analyses. Among them, the main challenge is that the training data does not exist, and the authoritative documents of events are lacking. Few scholars have systematically defined and labeled "political achievements classification" or "events." There is no suitable training corpus to train the automatic recognition model.

For such problems, a direct solution is a large number of manual markings, but this kind of work is time-consuming, labor-intensive, and impractical. Our goal is to achieve automated event identification and performance analysis without marking or with very little human labeling. The recent transfer learning in AI has inspired us: the training corpus for semantic analysis of modern Chinese is rich. Although there is no political achievement classification labeling, there is rich grammatical and semantic labeling corpus, so is it possible to use the modern Chinese semantic analysis model through automatic transformation (transfer), so that it can be applied to the environment of classical Chinese to predict the classification or event of political achievements?

Recently, our partner, the Institute of History and Philology, has focused on the study of local chronicles. In addition to official information, local chronicles also have chapters on official achievements, which can be used as research materials. So, on the one hand, we continue to assist in the automatic extraction of names and official authority files in the historical language; on the other hand, we also cooperate in automated event identification and political performance analysis research. As in previous years, we will also integrate the research results into the Digital Analysis System for Humanities of the Digital Cultural Center.

Find out more

Chinese Dependency Parser System

Word Segmentation and POS tagging for Ancient Chinese

Name-Official Automatic Extraction and Career Analysis System (2.0)

Event Extracton for Qing Dynasty Characters

Chinese Corpus Knowledge Ontology Architecture Online Browsing System

Ancient Chinese

Middle Chinese

Early Chinese

Modern Chinese

Academia Sinica Tagged Corpus of

Ancient Chinese

Middle Chinese

Early Chinese

Modern Chinese

Back to Project List

 

Facebook RSS


 

Subscribe RSS