Skip to content

Latest commit

 

History

History
52 lines (35 loc) · 2.16 KB

ORT-DML_QuickStart.md

File metadata and controls

52 lines (35 loc) · 2.16 KB

Generative AI models with ONNX Runtime and DirectML

ONNX Runtime Generative AI library makes it easy to deploy generative AI onnx models using ONNX Runtime and DirectML. Follow the documentation provided on the ONNX Runtime Generative AI library github repo for more details.

pip install numpy transformers torch onnx onnxruntime onnxruntime-directml
pip install onnxruntime-genai --pre
python3 -m onnxruntime_genai.models.builder -m model_name -o path_to_output_folder -p precision -e execution_provider 

For example, to build fp16 Phi-2 model:

python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -e dml -p fp16 -o ./models/phi2

To build int4 Phi-2 model:

python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -e dml -p int4 -o ./models/phi2
python3 -m onnxruntime_genai.models.builder -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider 

Run Inference

ORT Gen AI SDK provides a high level abstraction to run the full inference pipeline using its Generate() API.

Run the example code provided with ORT-GenAI repo.

git clone https://github.com/microsoft/onnxruntime-genai.git
cd onnxruntime-genai\examples\python
python model-qa.py -m {path to model folder} -ep dml

Run the example code provided with ORT-GenAI repo.