Skip to content

danielstankw/WordWatchSummary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

"WordWatch Summary"

The goal of this Python application is to streamline the process of finding, summarizing, and sharing news articles that mention a specific word of interest. Here’s how it works:

  1. Search and Scrape: The application searches for the specified word on Google, identifying occurrences in various articles, and collects the corresponding URLs.
  2. Summarization: Using a chosen large language model (LLM), the application generates concise summaries of the scraped articles.
  3. Editing: The summaries are reviewed and edited for clarity and coherence.
  4. Email Delivery: The edited summaries are embedded into an email and sent to the user via SMTP.

An example of email

alt text

This application is a proof of concept and can be easily expanded.

Installation

a. Ollama (Local LLM) - for details check Video

To provide summarization an Ollama framework was used, together with the gemma:2b model.

docker run -d --gpus=all -v <path_where_volume_is_mounted>:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

To stop or start a container

docker stop ollama
docker start ollama

To run the gemma:2b model - see the list of models

docker exec ollama ollama run gemma:2b

b. Set up local SMTP (Gmail)

Link to the YouTube video

How to run?

Create a virtual environment based on requirements.txt.
Modify the main.py and run the script.
Make sure to define the env variables in .env, see the .env.example for the required variables.

    # Find pages with:
    # all thses words:
    search.set_all_words("")
    # his exact word or phrase:
    search.set_exact_phrase("Johny Bravo")
    # any of these words:
    search.set_any_words("")
    # none of these words:
    search.set_none_words("")
    # numbers ranging from:
    search.set_number_range("", "")
    # Then narrow the results by:
    # language
    search.set_language("en")
    # region
    search.set_region("")
    # last update
    # d - day, w - week, m - month, y - year, all - anytime
    search.set_last_update("w")
    # site or domain
    search.set_site_or_domain("")
    # terms appearing
    search.set_terms_appearing("")
    # file type
    search.set_file_type("")
    # usage rights
    search.set_usage_rights("")

The fields are the same as in Google Advanced Search:

alt text

About

Find, scrape , summarize (LLM) and deliver!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages