PythonPyTorchFine-TuningGpt-2

Deep abstract generator

By Arapsih Güngör

Published on: September 11, 2023

Duration: 6 Months
Category: Masterthesis
Role: Developer

Sharing

Project Overview

This master thesis project explores the adaptation of GPT-2, a pre-trained model by OpenAI, for the task-specific challenge of generating coherent and contextually relevant scientific abstracts. Through fine-tuning techniques, this project aims to enhance the model's performance, tailoring its capabilities to better suit the unique requirements of academic literature synthesis.

Methodology

The project involved two primary approaches:

Fine-tuning GPT-2: Leveraging a pre-trained GPT-2 model and refining it using a curated dataset of scientific papers to improve its abstract generation capabilities.
Developing a Custom Transformer Model: Constructing a transformer model from scratch to address the nuances of scientific text and compare its effectiveness against the fine-tuned GPT-2 model.

Key Techniques Employed

Dataset Assembly: Compilation of a substantial corpus from scientific journals.
Model Training and Evaluation: Utilizing Python and PyTorch for model training, with evaluation based on METEOR, ROUGE, and BLEU metrics.

Results and Findings

The fine-tuned GPT-2 model significantly outperformed the custom transformer model, offering:

Faster training and evaluation times.
Higher accuracy in generating text that closely mirrors human-written abstracts.
Improved efficiency in processing, making it a viable tool for real-time applications.

Comparative Analysis

Metric	GPT-2 Fine-tuned	Custom Transformer
Training Time	9 hours	75 hours
METEOR Score	18.1%	3.7%
ROUGE Score	18.3%	1.8%
BLEU Score	21.6%	3.1%

Contributions to the Field

This thesis underscores the potential of fine-tuning pre-trained models over building new ones from scratch for specific tasks. The findings advocate for the application of deep learning techniques in automating aspects of scientific writing, potentially transforming how literature reviews are conducted.

Future Work

Suggestions for future research include:

Expanding the training dataset to cover a broader range of disciplines.
Exploring the integration of multi-lingual capabilities to support non-English texts.
Enhancing the model's understanding of complex scientific concepts and terminologies.