Azion AI Inference Models

Azion’s edge-optimized models span multiple AI domains including text generation, image analysis, embeddings, and more. Each model is designed to balance performance and resource efficiency for edge deployment.

This page provides a list of models available for use with Edge AI. To learn more about it, visit the Edge AI Reference.

Available Models

Mistral 3 Small (24B AWQ)

This is a language model that delivers capabilities comparable to larger models while being compact. It is ideal for conversational agents, function calling, fine-tuning, and local inference with sensitive data.

View details

BAAI/bge-reranker-v2-m3

A lightweight reranker model with strong multilingual capabilities. It offers multilingual support and it’s easy to deploy, with fast inference.

View details

InternVL3

InternVL3 is an advanced multimodal large language model with capabilities to encompass tool usage, GUI agents, industrial image analysis, 3D vision perception, and more.

View details

Qwen2.5 VL AWQ 3B

A Vision Language Model (VLM) that offers advanced capabilities such as visual analysis, agentic reasoning, long video comprehension, visual localization, and structured output generation.

View details

Qwen2.5 VL AWQ 7B

An instruction-tuned 30B-parameter FP8 causal language model for long-context (256K) text generation and reasoning, supporting chat/QA, summarization, multilingual tasks, math/science problem solving, coding, and tool-augmented workflows.

View details

Qwen3 30B A3B Instruct 2507 FP8

View details

Qwen3 Embedding 4B

A 4B-parameter multilingual embedding model (36 layers, 32K context) that outputs 2560‑dim vectors for text/code retrieval, classification, clustering, and bitext mining. It supports instruction-conditioned embeddings and is optimized for efficient, cross-lingual representation learning.

View details

Nanonets-OCR-s

An OCR model that converts document images to structured Markdown, preserving layout (headings, lists, tables) and basic tags. The output is easy to parse and feed into LLM pipelines.

View details

GPT-OSS 20B

An OpenAI model with 20 billion parameters, designed for text generation, conversation, and various natural language processing tasks. This open-source model provides robust performance for a wide range of applications with tool calling capabilities and 131k token context length.

View details