Project in TDDE16

Final project in course in Text Mining, NLP

Files

ext_summaries.ipynb contains model and test code
dataset_analysis.ipynb contains code to analyse the dataset used
TDDE16.pdf is the project report

Evaluation of Unsupervised Text Summarization Algorithms

This work investigates the potential performance gains of using sentence embeddings from S-BERT when using extractive unsupervised text summarization models such as LSA, Clustering and TextRank compared with using TFIDF. The CNN/Daily Mail dataset is used for evaluation with ROUGE scores as metrics. The performance of these models are also measured over the number of sentences allowed in the summary, to study how well the algorithms scales with summary length.

Models that utilize sentence similarity as the main metric for summary generation, such as Clustering, sees the largest performance increase at around 20% compared to using TFIDF. However, the models that performs better with TFIDF, TextRank and Weighted LSA, still performs best on the dataset. TextRank performs better on shorter summaries, while weighted LSA performs better on summaries with four sentences.

Evaluation outcome

Read more about evaluation, models, and discussion in the report.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
Report TDDE16.pdf		Report TDDE16.pdf
dataset_analysis.ipynb		dataset_analysis.ipynb
ext_summaries.ipynb		ext_summaries.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project in TDDE16

Files

Evaluation of Unsupervised Text Summarization Algorithms

Evaluation outcome

About

Releases

Packages

Languages

Sakib1418/tdde16-proj

Folders and files

Latest commit

History

Repository files navigation

Project in TDDE16

Files

Evaluation of Unsupervised Text Summarization Algorithms

Evaluation outcome

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages