Deep abstract generator
- Published on
- Duration
- 6 Months
- Category
- Masterthesis
- Role
- Developer
Project Overview
This master thesis project explores the adaptation of GPT-2, a pre-trained model by OpenAI, for the task-specific challenge of generating coherent and contextually relevant scientific abstracts. Through fine-tuning techniques, this project aims to enhance the model's performance, tailoring its capabilities to better suit the unique requirements of academic literature synthesis.
Methodology
The project involved two primary approaches:
- Fine-tuning GPT-2: Leveraging a pre-trained GPT-2 model and refining it using a curated dataset of scientific papers to improve its abstract generation capabilities.
- Developing a Custom Transformer Model: Constructing a transformer model from scratch to address the nuances of scientific text and compare its effectiveness against the fine-tuned GPT-2 model.
Key Techniques Employed
- Dataset Assembly: Compilation of a substantial corpus from scientific journals.
- Model Training and Evaluation: Utilizing Python and PyTorch for model training, with evaluation based on METEOR, ROUGE, and BLEU metrics.
Results and Findings
The fine-tuned GPT-2 model significantly outperformed the custom transformer model, offering:
- Faster training and evaluation times.
- Higher accuracy in generating text that closely mirrors human-written abstracts.
- Improved efficiency in processing, making it a viable tool for real-time applications.
Comparative Analysis
Metric | GPT-2 Fine-tuned | Custom Transformer |
---|---|---|
Training Time | 9 hours | 75 hours |
METEOR Score | 18.1% | 3.7% |
ROUGE Score | 18.3% | 1.8% |
BLEU Score | 21.6% | 3.1% |
Contributions to the Field
This thesis underscores the potential of fine-tuning pre-trained models over building new ones from scratch for specific tasks. The findings advocate for the application of deep learning techniques in automating aspects of scientific writing, potentially transforming how literature reviews are conducted.
Future Work
Suggestions for future research include:
- Expanding the training dataset to cover a broader range of disciplines.
- Exploring the integration of multi-lingual capabilities to support non-English texts.
- Enhancing the model's understanding of complex scientific concepts and terminologies.