This repository accompanies the YouTube tutorial demonstrating how to run a local large language model (LLM) using Hugging Face Transformers. The provided Python script implements a complete workflow for chat-style interactions with a locally stored model.
- Features
- Prerequisites
- Installation
- Usage
- Code Structure
- Configuration
- Contributing
- License
- Resources
- Local Model Loading π
Load pretrained causal language models from local checkpoints - Chat Template Formatting π¬
Supports conversational message formatting - Smart Tokenization π
Includes attention masking and device allocation - Controlled Generation ποΈ
Configurable parameters:- Temperature
- Top-p sampling
- Max new tokens
- Cross-Platform Support π»
Works on CPU (CUDA supported for GPU acceleration)
- Python 3.7+
- PyTorch (CPU/CUDA version)
- Hugging Face Transformers
- 8GB+ RAM (16GB recommended)
# Clone repository
git clone https://github.com/portalbh/SmolLM2.git
cd SmolLM2
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
# Install dependencies
pip install torch transformers