Embedl

Team

company

https://www.embedl.com

embedl

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

JonnaMat updated a model about 8 hours ago

embedl/Cosmos-Reason2-2B-W4A16

JonnaMat published a model about 9 hours ago

embedl/Cosmos-Reason2-2B-NVFP4A16

JonnaMat updated a model about 13 hours ago

embedl/Cosmos-Reason2-2B-NVFP4A16

View all activity

Organization Card

Community About org cards

Embedl

Embedl develops advanced tools and algorithms for Edge AI. Our mission is to make AI models run faster, more energy-efficient, and reliably across diverse hardware platforms, while significantly reducing development time.

We help teams deploy high-performance AI on real-world, resource-constrained devices.

Embedl Models (Community)

Pre-optimized models that can be used off-the-shelf or customized for specific hardware target supported by the embedl-models package.

First release highlights:

The fastest Small Language Models (SLMs) using FlashHead, a novel architectural improvement to the language-model head
Works with popular models like Llama, Gemma, and Qwen
Provides speedups on top of:
- Quantization
- Flash Attention
- Other standard optimizations

Device: Nvidia Jetson Thor

Model	Generation speed (tokens/s)
embedl/Llama-3.2-3B-Instruct-FlashHead-W4A16	100
Llama-3.2-3B-Instruct-W4A16*	80
RedHatAI/Llama-3.2-3B-Instruct-FP8	64
meta-llama/Llama-3.2-3B-Instruct	37

*Embedl quantized model for benchmarking similar to the FlashHead-W4A16 but without the faster FlashHead and custom generation loop.

Contact

Headquarters (Sweden)
Gamla Almedalsvägen 39
412 63 Gothenburg, Sweden

Email: contact@embedl.com

models 12

datasets 1

embedl/documentation-images

Viewer • Updated 8 days ago • 2 • 553

Embedl

We help teams deploy high-performance AI on real-world, resource-constrained devices.

Embedl Models (Community)

Pre-optimized models that can be used off-the-shelf or customized for specific hardware target supported by the embedl-models package.

First release highlights:

The fastest Small Language Models (SLMs) using FlashHead, a novel architectural improvement to the language-model head

Works with popular models like Llama, Gemma, and Qwen

Provides speedups on top of:

Quantization
Flash Attention
Other standard optimizations

Device: Nvidia Jetson Thor

Model	Generation speed (tokens/s)
embedl/Llama-3.2-3B-Instruct-FlashHead-W4A16	100
Llama-3.2-3B-Instruct-W4A16*	80
RedHatAI/Llama-3.2-3B-Instruct-FP8	64
meta-llama/Llama-3.2-3B-Instruct	37

*Embedl quantized model for benchmarking similar to the FlashHead-W4A16 but without the faster FlashHead and custom generation loop.

AI & ML interests

Recent Activity

Team members 5

Embedl

Embedl Models (Community)

Contact

models 12 Sort: Recently updated

datasets 1

AI & ML interests

Recent Activity

Team members 5

Embedl

Embedl Models (Community)

Contact

models 12 Sort: Recently updated

datasets 1

models 12

models 12