Generative AI models with ONNX Runtime and DirectML

ONNX Runtime Generative AI library makes it easy to deploy generative AI onnx models using ONNX Runtime and DirectML. Follow the documentation provided on the ONNX Runtime Generative AI library github repo for more details.

Installation

pip install numpy transformers torch onnx onnxruntime onnxruntime-directml
pip install onnxruntime-genai --pre

Build optimized ONNX model

Build ONNX model from PyTorch Hugging Face Checkpoint

python3 -m onnxruntime_genai.models.builder -m model_name -o path_to_output_folder -p precision -e execution_provider

For example, to build fp16 Phi-2 model:

python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -e dml -p fp16 -o ./models/phi2

To build int4 Phi-2 model:

python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -e dml -p int4 -o ./models/phi2

Build ONNX model from finetuned PyTorch Model

python3 -m onnxruntime_genai.models.builder -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider

Run Inference

ORT Gen AI SDK provides a high level abstraction to run the full inference pipeline using its Generate() API.

Python API

Run the example code provided with ORT-GenAI repo.

git clone https://github.com/microsoft/onnxruntime-genai.git
cd onnxruntime-genai\examples\python
python model-qa.py -m {path to model folder} -ep dml

C API

Run the example code provided with ORT-GenAI repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORT-DML_QuickStart.md

ORT-DML_QuickStart.md

Generative AI models with ONNX Runtime and DirectML

Installation

Build optimized ONNX model

Build ONNX model from PyTorch Hugging Face Checkpoint

Build ONNX model from finetuned PyTorch Model

Run Inference

Python API

C API

Files

ORT-DML_QuickStart.md

Latest commit

History

ORT-DML_QuickStart.md

File metadata and controls

Generative AI models with ONNX Runtime and DirectML

Installation

Build optimized ONNX model

Build ONNX model from PyTorch Hugging Face Checkpoint

Build ONNX model from finetuned PyTorch Model

Run Inference

Python API

C API