AI Analysis
AI analysis not yet available for this target.
Recent tweetsSee all on 𝕏 →
I also tried Unsloth's 35B A3B MTP on my 3060 12gb, and it's not as good as on the 3090
MTP probably gives better decode speed but leads to more experts being offloaded to the CPU
so overall decode speed is lower at the end with MTP version 33 tok/s vs 39 tok/s
l'll stick with classic ik_llama.cpp for my 3060
Unsloth released MTP (Multi-Token Prediction) version of Qwen 3.6 27B and 35B A3B
this gives a pretty nice boost on the decode side, but it impacts a bit the prefill
I think this will still be my default setup to gain a bit of decode speed, the drawback on the prefill is acceptable for me
you'll need this specific branch for llama.cpp : https://t.co/cJ9sUahlqx
https://t.co/k4AUdOOVKt
https://t.co/6x8rL1saVv
Some MTP metadata where missing yesterday, but seems like they published the proper GGUF since
works great with thinking too https://t.co/CuHJoaVvVJ
small GPU owners, 3060's fellows, 8 or 12gb VRAM chads,
> if you're using llama.cpp you should check out ik_llama.cpp
it gives about 10% better performances at 128k total context running Qwen3.6-35B-A3B IQ4_XS on a 3060 12gb
31,7 tok/s at 50% of context is not that bad for a subagent for local hermes setup
https://t.co/ZCN728Taoh
Signal Timeline
CR
@CryptoPicsou followed
Score breakdown0–100
Score breakdown not yet computed.
0
Below threshold (70)
Watching for additional signals.
Watching for additional signals.
Followers
1.9K
Account age
1.2y
Scouts
0
First seen
1w ago