Open to Collab

s3nh PRO

s3nh

s3nhxx
s3nh

AI & ML interests

Quantization, LLMs, Deep Learning for good. Follow me if you like my work. Patreon.com/s3nh

Recent Activity

reacted to mitkox's post with 🚀 about 8 hours ago

134,614 tok/sec input prefil max 1031 tokens/sec out gen max At these local AI speeds, there is no User Interface for humans. My human UI is the Radicle distributed Git issues queue On my GPU workstation: - Z8 Fury G5 4x A6000 - MiniMax-M2.5 - Claude Code to localhost:8000

liked a model 3 days ago

ysong21/entropy-v1-fp8

reacted to Tonic's post with 🔥 7 days ago

🙋🏻‍♂️hello my lovelies , it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment. repo : https://huggingface.co/spaces/Tonic/hugging-claw/tree/main (use git clone to inspect) literally the one-click link : https://huggingface.co/spaces/Tonic/hugging-claw?duplicate=true you can also run it locally and see for yourself : docker run -it -p 7860:7860 --platform=linux/amd64 \ -e HF_TOKEN="YOUR_VALUE_HERE" \ -e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \ -e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \ -e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \ registry.hf.space/tonic-hugging-claw:latest just a few quite minor details i'll take care of but i wanted to share here first

View all activity

Organizations

reacted to mitkox's post with 🚀 about 8 hours ago

Post

232

134,614 tok/sec input prefil max
1031 tokens/sec out gen max

At these local AI speeds, there is no User Interface for humans. My human UI is the Radicle distributed Git issues queue

On my GPU workstation:
- Z8 Fury G5 4x A6000
- MiniMax-M2.5
- Claude Code to localhost:8000

1 reply

reacted to Tonic's post with 🔥 7 days ago

Post

3115

🙋🏻‍♂️hello my lovelies ,

it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.

repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw

you can also run it locally and see for yourself :

docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest

just a few quite minor details i'll take care of but i wanted to share here first

2 replies

reacted to MonsterMMORPG's post with 🔥 9 days ago

Post

2970

SECourses Musubi Trainer upgraded to V27 and FLUX 2, FLUX Klein, Z-Image training added with demo configs - amazing VRAM optimized - read the news

App is here : https://www.patreon.com/posts/137551634

Full tutorial how to use and train : https://youtu.be/DPX3eBTuO_Y

1 reply

reacted to codelion's post with 🔥 11 days ago

Post

6126

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!

Key findings from our research on optimal architectures for small language models:

→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
→ Best-in-class factuality: 47.5% on TruthfulQA
→ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
→ Canon layers add only 0.13% parameters but improve reasoning

We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.

Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m

1 reply

reacted to giux78's post with 🔥 16 days ago

Post

188

Together with @mferraretto and @efederici we released #Nesso-4B, a new model specialized for agentic workflows.

mii-llm/nesso-4B

#Nesso-4B is a fine-tuned version of Qwen-4B, trained on a highly curated and balanced dataset designed specifically for multilingual agentic workflows and conversational use cases.

As shown in the video below we simulate, the new “cowork” from #Antrophic, without any data sharing all running on a consumer device. The model can be used to build agentic behavior in #privateAI environments.

Not every problem requires super intelligence: in many cases, intelligence at the edge is more than enough.

#Nesso4B #AgenticAI #PrivateAI #EdgeAI #OnDeviceAI

2 replies

reacted to AdinaY's post with 🔥 16 days ago

Post

370

GLM just entered the OCR field🔥

zai-org/GLM-OCR

✨ 0.9B
✨ MIT licensed
✨ Multimodal GLM-V architecture
✨ #1 on OmniDocBench v1.5 (94.62)

reacted to raincandy-u's post with 🔥 16 days ago

Post

2947

Introducing Rain-v2: Democratizing LLM training on gaming GPUs! ⚡

Following Rain-100M, we’re scaling up. Rain-v2 features a larger training dataset.

We’ve published a comprehensive blog covering the end-to-end journey—from raw data collection to rigorous evaluation and safety testing.

HF Repo: 🤗 raincandy-u/Rain-v2

Blog: 📚
https://angelkawaii.xyz/2026/01/29/rain-v2/

Special thanks to the open-source community and the SmolLM2 team for their foundational work! 🚀

HuggingFaceTB
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)

reacted to raincandy-u's post with 👍🔥 29 days ago

Post

5422

🤗 Just released Rain-100M, an experimental ~97M-parameter Qwen3-style language model trained from random initialization.

Repo: raincandy-u/Rain-100M

Data: HuggingFaceFW/fineweb-edu, ~3B tokens, English only

Tokenizer: custom 16k BPE, context length 4096

Architecture: 12 Transformer layers, hidden size 768, 12 heads, MLP 2048, SiLU, bf16

Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!

3 replies

reacted to sourceoftruthdata's post with ❤️🤗 4 months ago

Post

3439

What a fantastic community!

1 reply

reacted to appvoid's post with 👍 4 months ago

Post

4112

today is going to be a great day for small models, are you ready?

3 replies

reacted to their post with 🔥 4 months ago

Post

4251

Just tried to create an educational assistant for younger people who can struggle with visualsation of 'what is this sorcery all about'.
Its first step of my spare time projects, sft on Qwen3-8B,

EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.

s3nh/EduHelp-8B

Glad to share my work, have a wonderful day!

2 replies

reacted to ZennyKenny's post with 👍 4 months ago

Post

2186

Did Hugging Face just ban hammer a bunch of bot accounts or am I just so uninteresting that 30% of my subs dropped me overnight?

😬 Wait, don't answer that.

2 replies

posted an update 4 months ago

Post

684

Eduhelp with more empathy, based on model finetuned on
psychotheraputic preferences just landed on

Beck-8B as a base model, 13000 steps on educational dataset.
Time to go further and build more 🥰
s3nh/EduHelp_Beck_8B
Thanks to @basilic_ai for computations <3

replied to their post 4 months ago

Thanks!

posted an update 4 months ago

Post

4251

2 replies

reacted to Severian's post with 👀 4 months ago

Post

404

New Technique to Deeply Poison AI on Images and Prove Creative Provenance

I've developed a new method to protect creative work from unauthorized AI training. My Poisonous Shield for Images algorithm embeds a deep, removal-resistant poison into the mathematical structure of your images. It's designed to be toxic to machine learning models, achieving up to 20-348% disruption in AI training convergence in benchmark tests.

Unlike traditional watermarks, this protection survives compression and resizing and is not removed by standard tools. The technique also embeds cryptographic proof of provenance directly into the image, verifying ownership and detecting tampering.

You can see examples and learn more about how and WHY it works better than current methods:

https://severian-poisonous-shield-for-images.static.hf.space

If you are interested in using this technology to protect your work from AI training and unauthorized use, please reach out to me. It is currently in the prototype phase but fully functioning and effective. Still working on expanding it to a production-grade usable app.

This is not intended as a pure self-promotion post. I am genuinely wanting to help creators and want to gauge interest from different communities. I've spent the past year and a half building this from scratch with new math and code to try and solve this massive problem.

reacted to Severian's post with 👍 5 months ago

Post

3248

MLX port of BDH (Baby Dragon Hatchling) is up!

I’ve ported the BDH ( https://github.com/pathwaycom/bdh ) model to MLX for Apple Silicon. It’s a faithful conversion of the PyTorch version: same math, same architecture (byte-level vocab, shared weights across layers, ReLU sparsity, RoPE attention with Q=K), with MLX-friendly APIs and a detailed README explaining the few API-level differences and why results are equivalent.

Code, docs, and training script are ready to use. You may need to adjust the training script a bit to fit your own custom dataset. Only tested on M4 so far, but should work perfect for any M1/M2/M3 users out there.

I’m currently training this MLX build on my Internal Knowledge Map (IKM) dataset Severian/Internal-Knowledge-Map
Training’s underway; expect a day or so before I publish weights. When it’s done, I’ll upload the checkpoint to Hugging Face for anyone to test.

Repo: https://github.com/severian42/BDH-MLX
HF model (coming soon): Severian/BDH-MLX

If you try it on your own data, feedback and PRs are welcome.

reacted to mitkox's post with 🚀 5 months ago

Post

406

Hermes4 70B synthetic dataset generation on my desktop Z8 GPU rig:
307 tok/sec
1.1M tok/hour

The bottleneck for generating massive, high-quality reinforcement learning datasets is never the GPU compute; it's always the model's willingness to actually answer the darn question.

Open to Collab

88 16 265

s3nh PRO

s3nh

s3nhxx
s3nh

AI & ML interests

Quantization, LLMs, Deep Learning for good. Follow me if you like my work. Patreon.com/s3nh

Recent Activity

reacted to mitkox's post with 🚀 about 8 hours ago

liked a model 3 days ago

ysong21/entropy-v1-fp8

reacted to Tonic's post with 🔥 7 days ago

View all activity

Organizations

reacted to mitkox's post with 🚀 about 8 hours ago

Post

232

134,614 tok/sec input prefil max
1031 tokens/sec out gen max

At these local AI speeds, there is no User Interface for humans. My human UI is the Radicle distributed Git issues queue

On my GPU workstation:
- Z8 Fury G5 4x A6000
- MiniMax-M2.5
- Claude Code to localhost:8000

1 reply

reacted to Tonic's post with 🔥 7 days ago

Post

3115

2 replies

reacted to MonsterMMORPG's post with 🔥 9 days ago

Post

2970

1 reply

reacted to codelion's post with 🔥 11 days ago

Post

6126

1 reply

reacted to giux78's post with 🔥 16 days ago

Post

188

2 replies

reacted to AdinaY's post with 🔥 16 days ago

Post

370

GLM just entered the OCR field🔥

zai-org/GLM-OCR

✨ 0.9B
✨ MIT licensed
✨ Multimodal GLM-V architecture
✨ #1 on OmniDocBench v1.5 (94.62)

reacted to raincandy-u's post with 🔥 16 days ago

Post

2947

HuggingFaceTB
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model (2502.02737)

reacted to raincandy-u's post with 👍🔥 29 days ago

Post

5422

3 replies

reacted to sourceoftruthdata's post with ❤️🤗 4 months ago

Post

3439

What a fantastic community!

1 reply

reacted to appvoid's post with 👍 4 months ago

Post

4112

today is going to be a great day for small models, are you ready?

3 replies

reacted to their post with 🔥 4 months ago

Post

4251

2 replies

reacted to ZennyKenny's post with 👍 4 months ago

Post

2186

Did Hugging Face just ban hammer a bunch of bot accounts or am I just so uninteresting that 30% of my subs dropped me overnight?

😬 Wait, don't answer that.

2 replies

posted an update 4 months ago

Post

684

replied to their post 4 months ago

Thanks!

posted an update 4 months ago

Post

4251

2 replies

reacted to Severian's post with 👀 4 months ago

Post

404

reacted to Severian's post with 👍 5 months ago

Post

3248

reacted to mitkox's post with 🚀 5 months ago

Post

406

s3nh PRO

AI & ML interests

Recent Activity

Organizations

s3nh's activity

s3nh PRO

AI & ML interests

Recent Activity

Organizations

s3nh's activity