Jetson LLM Interface
8/8 Models Online
20-80 tokens/sec
Connected to Jetson
Select Model
🚀 TinyLlama 1.1B
• ~80 tokens/sec
• Ultra-fast lightweight model for simple queries and conversations
📚 Gemma 2B
• ~50 tokens/sec
• Google's efficient model optimized for educational and explanatory content
🎨 Llama 3.2 3B
• ~40 tokens/sec
• Meta's latest model for creative writing and general conversation
💻 Phi-3 3.8B
• ~35 tokens/sec
• Microsoft's compact model specialized for coding and technical tasks
🌏 Qwen 2.5 3B
• ~40 tokens/sec
• Alibaba's multilingual model with strong reasoning capabilities
🧠 Mistral 7B Instruct
• ~20 tokens/sec
• High-quality instruction-following model for complex reasoning
🔮 OpenHermes 2.5 7B
• ~22 tokens/sec
• Fine-tuned model for structured output and consistent responses
🎯 RAG-Enhanced (Auto-Select)
• Variable (context-aware)
• Intelligent model selection with knowledge retrieval from vector database
Use Cases
💬 Conversational Agent
✍️ Content Generation
📊 Domain Analysis
🔍 Information Retrieval
System Performance
20-80 tokens/sec
Average Speed
💬 Conversation
Clear
Benchmark
Start a conversation or select an example prompt below
Try these examples:
Send
Loading...
Generating response...
Benchmark Results
Last response:
| Time:
s | Speed:
tokens/sec