Comparative Evaluation of LSA-Based Summarization Against Traditional and Neural Approaches Using Cosine Similarity
Abstract
This study presents a comparative evaluation of Latent Semantic Analysis (LSA)-based extractive summarization against traditional statistical and neural approaches using cosine similarity as the principal evaluation metric. The methodology involves implementing an LSA summarizer on structured textual data, particularly a speech document, and analysing its performance relative to Naïve Bayes and Rank Net-based models. Key evaluation criteria include precision, recall, F1 score, and semantic similarity between original and summarized texts. Results show that while LSA marginally trails neural models in performance, it significantly outperforms traditional approaches and offers advantages in interpretability, computational efficiency, and adaptability. The study also explores how sentence scoring within the semantic space contributes to summary quality, as well as the effect of summary length on content retention. Visual data representations support these findings and highlight the model’s semantic focus. Recommendations suggest using LSA in low-resource settings or as part of hybrid systems. Future research directions include expanding to multi-document and multilingual summarization, as well as integrating sentence compression. Overall, LSA is reaffirmed as a viable, adaptable, and efficient summarization method suitable for various real-world applications.