Jim Lai
grimjim
AI & ML interests
Experimenting primarily with 7B-12B parameter text completion models. Not all models are intended for direct end use, but aim for research and/or educational purposes.
Recent Contributions: stabilized refusal direction ablation via Gram-Schmidt orthonormalization and norm-preserving interventions; confirmed reasoning transfer via model merger.
Recent Activity
updated
a model
1 day ago
grimjim/Equatorium-v1-12B
published
a model
1 day ago
grimjim/Equatorium-v1-12B
posted
an
update
2 days ago
After tinkering with Gemma Scope 2, I now have an mechanistic explanation of why Winsorization was as effective as it was in my ablation experiments on Gemma 3 12B Instruct. In short, the activation for the BOS token overwhelms everything else. Gemma Scope 2 deliberately did not train on the BOS token. Winsorization capped the magnitude of the BOS token, allowing the activations of other tokens to be compared.
https://huggingface.co/google/gemma-scope-2-12b-it