中文版 | English

The Integration of Large Language Models and Knowledge Bases – Searching, QA, and Navigation of Knowledge Bases with Natural Language.

Basic information
Project identifier AS-ASCDC-113-202
Conducted by Institution of Information Science
Director
Overview

In November 2022, OpenAI dropped a bombshell in the AI community - ChatGPT, capable of understanding most human languages and responding to any question in any language, including Chinese. It can engage in conversation, write novels, and even code. Recently, many companies have attempted to use the ChatGPT API as customer service robots for their products, interacting with users on websites or social media regarding their products. This application gives us a bold imagination: since ChatGPT can help companies engage naturally with users about their products, couldn't ChatGPT also possibly be used by digital humanities platforms or institutions like museums to interact naturally with users about their collection data or artifacts? Such use of natural language for natural interaction is akin to having a guide from a collection institution always nearby, ready to explain, answer, interact, converse, and even question. For instance, while a digital viewer is observing a picture or 3D image of the Jadeite Cabbage, a “digital guide” played by ChatGPT can have a real-time conversation with them. Or a humanities scholar studying the life of Li Wei, a Qing Dynasty figure, can utilize ChatGPT as a “consultant” for arbitrary inquiries, feedback, and viewpoints, all based on the collection data or artifacts, not on ChatGPT's original memory. This powerful tool may significantly benefit digital humanities research methodologies and provide an easier medium for general users to access digital humanities.

Therefore, I intend to put this bold imagination into actual practice by closely collaborating with the Digital Humanities Platform team at Academia Sinica, to create such a virtual tour guide that can utilize natural language to conduct searches, interact, explain, and so on for users. I propose a two-year plan (this plan) for this project. In the first year, the focus will be limited to conducting fact-based QA for users through natural language. In the second year, the focus will be on using natural language for user interaction and explanation. This plan strategically has two main directions: 1. Using OpenAI's ChatGPT or GPT-4 API as the core, we will empower it to precisely query collection data and conduct factual QA with users. 2. Considering that ChatGPT or GPT-4 API still requires payment, along with considerations like operational independence, privacy of collection data, developing and owning a large language model is necessary. It allows for pretraining and fine-tuning based on collection data, serving as another option under free conditions.

 

Find out more

Chating with Historical Figures
Digital Museum of Capturing the Ancients from Codices

Back to Project List

 

Facebook RSS


 

Subscribe RSS