This project fine-tunes the Llama 1B model to generate titles for research papers based on their abstracts, using a custom dataset of title-abstract pairs. By leveraging techniques like LoRA (Low-Rank Adaptation) quantization, the model is optimized for efficient training and inference.
- Project Overview
- Dataset
- Model and Techniques
- Installation
- Usage
- Project Structure
- Results
- Acknowledgments
The aim of this project is to generate accurate and relevant research paper titles by training a language model to understand the abstract and context of each paper. By employing Llama 1B as the base model, this fine-tuning process demonstrates how pre-trained language models can be adapted for specialized NLP tasks such as title generation.
- Dataset: A dataset of research papers containing two columns:
title
andabstract
. - Data Preprocessing: The dataset is preprocessed to ensure high-quality input, and tokenization is performed using the Llama tokenizer.
- Model: Llama 1B model by Meta AI, chosen for its balance between performance and efficiency.
- Quantization: LoRA quantization is applied to make fine-tuning feasible on smaller hardware setups by reducing memory usage.
- Training: The model is fine-tuned using Hugging Face's Trainer API, which simplifies the training loop, handling evaluation metrics and model checkpoints.
- Evaluation: The model is evaluated based on title generation accuracy and loss metrics, which help measure its ability to generalize to unseen abstracts.
To replicate this project, set up the environment by installing the necessary libraries:
# Clone the repository
git clone https://github.com/your_username/llama-title-generator.git
cd llama-title-generator
# Install dependencies
pip install -r requirements.txt
Requirements can be generated from your environment using:
pip freeze > requirements.txt
-
Data Preparation:
- Ensure your dataset is structured with
title
andabstract
columns. - Save the dataset as
data/titles_abstracts.csv
.
- Ensure your dataset is structured with
-
Training the Model:
- Use the Jupyter notebook to load and preprocess the dataset, initialize the model, and start fine-tuning:
jupyter notebook llm_llama_1b_finetune_generate_title.ipynb
- Use the Jupyter notebook to load and preprocess the dataset, initialize the model, and start fine-tuning:
-
Evaluating the Model:
- After training, evaluate the model on a validation dataset to verify its performance.
-
Inference:
- Use the model to generate titles from new abstracts by running the inference section of the notebook.
.
├── __pycache__/ # Python bytecode cache directory
├── llama_1B_lora_finetuned/ # Directory containing the fine-tuned Llama model using LoRA
├── Hugginface_prasun.py # Script for Hugging Face model integration and utilities
├── llm_llama_1b_finetune_generate_title.ipynb # Jupyter notebook for fine-tuning Llama and generating titles
├── requirements.txt # Project dependencies and their versions
└── title_maker.py # Core script for title generation functionality
The fine-tuned model shows promising results in generating titles that are contextually relevant to the provided abstracts. Further evaluation metrics are saved in the notebook.
-
Meta AI's Llama model and its Hugging Face page for Llama 3.2-1B for providing the foundational pre-trained language model used in this project.
-
Hugging Face for their Trainer API, which simplifies model training and deployment.
-
LoRA quantization technique for memory-efficient training.
-
For the DATASET
@misc{acar_arxiver2024, author = {Alican Acar, Alara Dirik, Muhammet Hatipoglu}, title = {ArXiver}, year = {2024}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/datasets/neuralwork/arxiver}} }