Skip to content

Evaluation of Unsupervised Text Summarization Models (TDDE16 Project)

Notifications You must be signed in to change notification settings

Sakib1418/tdde16-proj

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project in TDDE16

Final project in course in Text Mining, NLP

Files

  • ext_summaries.ipynb contains model and test code
  • dataset_analysis.ipynb contains code to analyse the dataset used
  • TDDE16.pdf is the project report

Evaluation of Unsupervised Text Summarization Algorithms

This work investigates the potential performance gains of using sentence embeddings from S-BERT when using extractive unsupervised text summarization models such as LSA, Clustering and TextRank compared with using TFIDF. The CNN/Daily Mail dataset is used for evaluation with ROUGE scores as metrics. The performance of these models are also measured over the number of sentences allowed in the summary, to study how well the algorithms scales with summary length.

Models that utilize sentence similarity as the main metric for summary generation, such as Clustering, sees the largest performance increase at around 20% compared to using TFIDF. However, the models that performs better with TFIDF, TextRank and Weighted LSA, still performs best on the dataset. TextRank performs better on shorter summaries, while weighted LSA performs better on summaries with four sentences.

Evaluation outcome

perf
Read more about evaluation, models, and discussion in the report.

About

Evaluation of Unsupervised Text Summarization Models (TDDE16 Project)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%