Holistic Evaluation of Language Models

Welcome! The crfm-helm Python package contains code used in the Holistic Evaluation of Language Models project (paper, website) by Stanford CRFM. This package includes the following features:

Collection of datasets in a standard format (e.g., NaturalQuestions)
Collection of models accessible via a unified API (e.g., GPT-3, MT-NLG, OPT, BLOOM)
Collection of metrics beyond accuracy (efficiency, bias, toxicity, etc.)
Collection of perturbations for evaluating robustness and fairness (e.g., typos, dialect)
Modular framework for constructing prompts from datasets
Proxy server for managing accounts and providing unified interface to access models

To get started, refer to the documentation on Read the Docs for how to install and run the package.

Directory Structure

The directory structure for this repo is as follows

├── docs # MD used to generate readthedocs │ ├── scripts # Python utility scripts for HELM │ ├── cache │ ├── data_overlap # Calculate train test overlap │ │ ├── common │ │ ├── scenarios │ │ └── test │ ├── efficiency │ ├── fact_completion │ ├── offline_eval │ └── scale └── src ├── helm # Benchmarking Scripts for HELM │ │ │ ├── benchmark # Main Python code for running HELM │ │ │ │ │ └── static # Current JS (Jquery) code for rendering front-end │ │ │ │ │ └── ... │ │ │ ├── common # Additional Python code for running HELM │ │ │ └── proxy # Python code for external web requests │ └── helm-frontend # New React Front-end

Name	Name	Last commit message	Last commit date
Latest commit brianwgoldman Fix typing issues in metrics/ and remove check_untyped_defs. (stanfor… Oct 25, 2023 9be35a3 · Oct 25, 2023 History 4,488 Commits
.github/workflows	.github/workflows	Support Python 3.9 and 3.10 (stanford-crfm#1897 )	Oct 24, 2023
docs	docs	Update tutorial with correct name for default output directory. (stan…	Oct 17, 2023
scripts	scripts	Add the Tokenizer object logic (stanford-crfm#1874 )	Oct 25, 2023
src	src	Fix typing issues in metrics/ and remove check_untyped_defs. (stanfor…	Oct 25, 2023
.gitignore	.gitignore	Add support for multimodal content and VLM evaluation (stanford-crfm#…	Oct 12, 2023
.pre-commit-config.yaml	.pre-commit-config.yaml	Fix tsx type name in pre-commit (stanford-crfm#1898 )	Oct 16, 2023
.readthedocs.yaml	.readthedocs.yaml	Fix ReadTheDocs YAML configuration	Nov 22, 2022
CHANGELOG.md	CHANGELOG.md	Release v0.2.4 (stanford-crfm#1849 )	Sep 20, 2023
LICENSE	LICENSE	Fill in license template	Oct 12, 2022
MANIFEST.in	MANIFEST.in	Group dependencies and remove requirements.txt (stanford-crfm#1681 )	Jun 22, 2023
README.md	README.md	Add directory structure to Readme & add readme to PyPI (stanford-crfm…	Oct 24, 2023
demo.py	demo.py	Rename modules and commands	Nov 16, 2022
install-dev.sh	install-dev.sh	Force upgrade of pip in install_dev.sh (stanford-crfm#1905 )	Oct 17, 2023
json-urls-root.js	json-urls-root.js	Storage Cost Reduction (stanford-crfm#1657 )	Sep 6, 2023
json-urls.js	json-urls.js	Storage Cost Reduction (stanford-crfm#1657 )	Sep 6, 2023
mkdocs.yml	mkdocs.yml	Add documentation for adding new models (stanford-crfm#1325 )	May 9, 2023
pre-commit.sh	pre-commit.sh	Optional dependencies (stanford-crfm#1798 )	Aug 24, 2023
pyproject.toml	pyproject.toml	Install using the setuptools.build_meta backend (stanford-crfm#1535 )	May 13, 2023
requirements.txt	requirements.txt	Update transitive dependencies to address most Dependabot security al…	Oct 17, 2023
setup.cfg	setup.cfg	Add directory structure to Readme & add readme to PyPI (stanford-crfm…	Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Holistic Evaluation of Language Models

Directory Structure

About

Releases

Packages

Languages

License

toi-dawn/helm

Folders and files

Latest commit

History

Repository files navigation

Holistic Evaluation of Language Models

Directory Structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages