Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!
Key findings from our research on optimal architectures for small language models:
→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning
We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.
#Nesso-4B is a fine-tuned version of Qwen-4B, trained on a highly curated and balanced dataset designed specifically for multilingual agentic workflows and conversational use cases.
As shown in the video below we simulate, the new “cowork” from #Antrophic, without any data sharing all running on a consumer device. The model can be used to build agentic behavior in #privateAI environments.
Not every problem requires super intelligence: in many cases, intelligence at the edge is more than enough.
Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!
Just tried to create an educational assistant for younger people who can struggle with visualsation of 'what is this sorcery all about'. Its first step of my spare time projects, sft on Qwen3-8B,
EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.
Eduhelp with more empathy, based on model finetuned on psychotheraputic preferences just landed on
Beck-8B as a base model, 13000 steps on educational dataset. Time to go further and build more 🥰 s3nh/EduHelp_Beck_8B Thanks to @basilic_ai for computations <3
Just tried to create an educational assistant for younger people who can struggle with visualsation of 'what is this sorcery all about'. Its first step of my spare time projects, sft on Qwen3-8B,
EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.
New Technique to Deeply Poison AI on Images and Prove Creative Provenance
I've developed a new method to protect creative work from unauthorized AI training. My Poisonous Shield for Images algorithm embeds a deep, removal-resistant poison into the mathematical structure of your images. It's designed to be toxic to machine learning models, achieving up to 20-348% disruption in AI training convergence in benchmark tests.
Unlike traditional watermarks, this protection survives compression and resizing and is not removed by standard tools. The technique also embeds cryptographic proof of provenance directly into the image, verifying ownership and detecting tampering.
You can see examples and learn more about how and WHY it works better than current methods:
If you are interested in using this technology to protect your work from AI training and unauthorized use, please reach out to me. It is currently in the prototype phase but fully functioning and effective. Still working on expanding it to a production-grade usable app.
This is not intended as a pure self-promotion post. I am genuinely wanting to help creators and want to gauge interest from different communities. I've spent the past year and a half building this from scratch with new math and code to try and solve this massive problem.
I’ve ported the BDH ( https://github.com/pathwaycom/bdh ) model to MLX for Apple Silicon. It’s a faithful conversion of the PyTorch version: same math, same architecture (byte-level vocab, shared weights across layers, ReLU sparsity, RoPE attention with Q=K), with MLX-friendly APIs and a detailed README explaining the few API-level differences and why results are equivalent.
Code, docs, and training script are ready to use. You may need to adjust the training script a bit to fit your own custom dataset. Only tested on M4 so far, but should work perfect for any M1/M2/M3 users out there.
I’m currently training this MLX build on my Internal Knowledge Map (IKM) dataset Severian/Internal-Knowledge-Map Training’s underway; expect a day or so before I publish weights. When it’s done, I’ll upload the checkpoint to Hugging Face for anyone to test.
Hermes4 70B synthetic dataset generation on my desktop Z8 GPU rig: 307 tok/sec 1.1M tok/hour
The bottleneck for generating massive, high-quality reinforcement learning datasets is never the GPU compute; it's always the model's willingness to actually answer the darn question.