The goal of this Python application is to streamline the process of finding, summarizing, and sharing news articles that mention a specific word of interest. Here’s how it works:
- Search and Scrape: The application searches for the specified word on Google, identifying occurrences in various articles, and collects the corresponding URLs.
- Summarization: Using a chosen large language model (LLM), the application generates concise summaries of the scraped articles.
- Editing: The summaries are reviewed and edited for clarity and coherence.
- Email Delivery: The edited summaries are embedded into an email and sent to the user via SMTP.
An example of email
This application is a proof of concept and can be easily expanded.
a. Ollama (Local LLM) - for details check Video
To provide summarization an Ollama framework was used, together with the gemma:2b
model.
docker run -d --gpus=all -v <path_where_volume_is_mounted>:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
To stop or start a container
docker stop ollama
docker start ollama
To run the gemma:2b
model - see the list of models
docker exec ollama ollama run gemma:2b
b. Set up local SMTP (Gmail)
Link to the YouTube video
Create a virtual environment based on requirements.txt
.
Modify the main.py
and run the script.
Make sure to define the env variables in .env
, see the .env.example
for the required variables.
# Find pages with:
# all thses words:
search.set_all_words("")
# his exact word or phrase:
search.set_exact_phrase("Johny Bravo")
# any of these words:
search.set_any_words("")
# none of these words:
search.set_none_words("")
# numbers ranging from:
search.set_number_range("", "")
# Then narrow the results by:
# language
search.set_language("en")
# region
search.set_region("")
# last update
# d - day, w - week, m - month, y - year, all - anytime
search.set_last_update("w")
# site or domain
search.set_site_or_domain("")
# terms appearing
search.set_terms_appearing("")
# file type
search.set_file_type("")
# usage rights
search.set_usage_rights("")
The fields are the same as in Google Advanced Search: