This application analyzes educational text passages to identify optimal intervention points by mapping content to specific academic skills. It uses large language models (LLMs) to intelligently detect, rate, and explain skill alignment—helping educators personalize instruction and improve learning outcomes.
See full output here: output/combined_data_final.xlsx
- Uses a comprehensive taxonomy of 69 educational competencies
- Skills span a wide range of domains:
- Science (e.g., life sciences, physics, earth science)
- Social Studies (e.g., history, geography, civics)
- Language Arts
- Mathematics
- Arts & Physical Education
- Digital Literacy
- Powered by Groq LLM API (llama3-70b-8192)
- Uses structured prompt templates to ensure consistency
- Low temperature (0.01) for deterministic, repeatable outputs
Each model response includes:
- Identified skill tag(s)
- Alignment rating (scale of 0–10)
- Pedagogical explanation
- Highlighted text excerpt supporting the alignment
The system pinpoints passages that:
- Strongly align with specific skills (ratings: 9–10)
- Show partial alignment or emerging understanding (ratings: 5–6)
- Offer opportunities for teacher-led discussion or review
- Map multiple skills to the same passage when relevant
- Python-based processing pipeline
- Structured prompt engineering - JSON Output
- LLM output stored and analyzed using DataFrames
- Embedding-based dataset joins to reduce hallucinations
- Final output: Excel reports for easy review & collaboration
-
Sign up at Groq and obtain a free API key.
-
Set up your environment variable:
export GROQ_API_KEY='your-api-key-here'
-
Set up the virtual environment:
pip install -r requirements.txt
-
Run the story processing script:
python run_01_process_stories.py
-
Generate the final AI output:
python run_02_combine_data.py
Output: combined_data_final.xlsx
- Create a reference dataset with human-reviewed examples.
- Use these examples as few-shot prompts to guide and improve LLM output quality.
- Randomly sample and review LLM-generated outputs.
- Human raters evaluate skill alignment, clarity, and pedagogical value.
- Compare human and model ratings to fine-tune prompts.
- Identify skill categories or content formats where the model underperforms.
- Fine-tune a model to serve as a proxy reviewer.
- Helps reduce reliance on manual reviews for future outputs.
- Run controlled comparisons of LLM-generated interventions.
- Use engagement or comprehension metrics to assess effectiveness.
- Improve LLM output validation and error handling
- Implement a scalable LLM-as-a-Judge system for reviews
- Add another prompt for skills assessment
- Add dynamic text highlighting based on skill strength
- Integrate student engagement metrics for optimization
- Visualize and track skill dependencies across stories