ONNX Runtime Generative AI library makes it easy to deploy generative AI onnx models using ONNX Runtime and DirectML. Follow the documentation provided on the ONNX Runtime Generative AI library github repo for more details.
pip install numpy transformers torch onnx onnxruntime onnxruntime-directml
pip install onnxruntime-genai --pre
python3 -m onnxruntime_genai.models.builder -m model_name -o path_to_output_folder -p precision -e execution_provider
For example, to build fp16 Phi-2 model:
python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -e dml -p fp16 -o ./models/phi2
To build int4 Phi-2 model:
python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -e dml -p int4 -o ./models/phi2
python3 -m onnxruntime_genai.models.builder -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider
ORT Gen AI SDK provides a high level abstraction to run the full inference pipeline using its Generate()
API.
Run the example code provided with ORT-GenAI repo.
git clone https://github.com/microsoft/onnxruntime-genai.git
cd onnxruntime-genai\examples\python
python model-qa.py -m {path to model folder} -ep dml
Run the example code provided with ORT-GenAI repo.