pubmed.md is a simple Streamlit application that scrapes the titles and abstracts of PubMed articles and returns them in a LLM-friendly way. The backend is a REST API built with Flask.
The first step, common to both the Docker and the source code setup approaches, is to clone the repository and access it:
git clone https://github.com/AstraBert/pubmed-md.git
cd pubmed-md/
Once there, you can choose one of the two following approaches:
Required: Docker and docker compose
- Launch the Docker application:
docker compose up backend -d
docker compose up frontend -d
- Or
# On Linux/MacOS
bash start_services.sh
# On Windows
.\start_services.ps1
You will see the application running on http://localhost:8501 and you will be able to use it. Depending on your connection and on your hardware, the set up might take some time (up to 30 mins to set up) - but this is only for the first time your run it!
Required: python
- Create and activate a virtual environment:
python3 -m venv .venv
# Linux/macOS
source .venv/bin/activate
# Windows PS
.venv\Scripts\Activate.ps1
# Windows CMD
.venv\Scripts\activate.bat
- Install all necessary dependencies:
python3 -m pip install -r requirements.txt
- Launch the REST API:
cd scripts/
gunicorn api:app -b 0.0.0.0:5000
- Launch the Streamlit app:
streamlit run app.py
You will see the application running on http://localhost:8501 and you will be able to use it.
The backend is a REST API with one endpoint - pubmed
: this endpoint receives a request from the frontend that carries the ID of the article whose title and abstract are to scrape, and sends a request to NCBI through the Entrez
protocol available in BioPython. The request is processed and returned as XML text: the XML text is then converted to markdown.
Contributions are always welcome! Follow the contributions guidelines reported here.
The software is provided under MIT license.