🤗Transformers

Topic	Replies	Views	Activity
ImportError for function find_pruneable_heads_and_indices 🤗Transformers	1	23	March 16, 2026
Transformers.js: Retrieving the size of models in MB/GB before running 🤗Transformers	1	7	March 16, 2026
Purpose of commit_hash in PreTrainedModel.from_pretrained 🤗Transformers	1	17	March 16, 2026
How DEoT Makes LLMs Think: A New Framework for Open-Ended Reasoning 🤗Transformers	2	12	March 15, 2026
AutoModel with ClinicalBERT gives UNEXPECTED warning 🤗Transformers	3	24	March 13, 2026
Are biofoundation models actually used in practice and how helpful they are? 🤗Transformers	0	6	March 10, 2026
Overfitting in BERT IMDB50k 🤗Transformers	2	1142	March 6, 2026
LLM Course code errors 🤗Transformers	7	89	March 6, 2026
Different output when we inference through packing with flash attention in bf16 🤗Transformers	1	11	March 6, 2026
Why are gradient_checkpointing and training bound? 🤗Transformers	2	21	March 2, 2026
Wave Field LLM — O(n log n) attention via wave equation dynamics, within 5% of standard transformer 🤗Transformers	2	4935	March 2, 2026
Attentions not returned from transformers ViT model when using output_attentions=True 🤗Transformers	5	1219	March 2, 2026
Using hyperparameter-search in Trainer 🤗Transformers	102	38923	March 2, 2026
Issue with summarization and translation pipeline 🤗Transformers	3	38	March 2, 2026
Is LLaMA rotary embedding implementation correct? 🤗Transformers	8	9560	February 26, 2026
Gemma 3 12B: 4-bit Quantization failing/ignored in Transformers v5.1.0 (Gemma3ForConditionalGeneration) 🤗Transformers	10	138	February 23, 2026
[Help Needed] Dual-Phase Softmax Steering on Llama-2 Residual Stream Yields Identical POPE Results 🤗Transformers	3	33	February 23, 2026
[Research/Discussion] Depth-agnostic stability for residual models (no extra norms, no tuning). Is this useful to you? 🤗Transformers	1	25	February 22, 2026
LLaVA Steering: Why does grounding fix hallucinations in captioning but not in Yes/No QA? 🤗Transformers	1	32	February 19, 2026
KV Caching problem with gemma 3 🤗Transformers	2	59	February 17, 2026
Num_beam_groups removed in V5? 🤗Transformers	1	53	February 14, 2026
[LLaVA-1.5] Implementing Control Barrier Functions (LCBF) via Attention Hooking – Persistent AttributeError: 'LlamaAttention' object has no attribute 'rotary_emb' 🤗Transformers	4	20	February 13, 2026
Error while importing "Trainer" 🤗Transformers	1	106	February 13, 2026
[LLaVA-1.5] Very low hallucination rate & weak attention correlation in "Attention Gap" experiment – Is my implementation of output_attentions correct? 🤗Transformers	4	28	February 12, 2026
Confusion with freezing Whisper's feature encoder 🤗Transformers	3	28	February 11, 2026
When using Whisper, pipeline notifies that generation_config default values have been modified, even for base models 🤗Transformers	4	57	February 8, 2026
Hyperparameters vs message format prompt tuning 🤗Transformers	2	31	February 6, 2026
SFT Conversation llama3-8b-Instruct fails with assistant_only_loss=True 🤗Transformers	2	113	February 5, 2026
How to train T5 to distinguish task-relevant tokens from contextual noise? 🤗Transformers	1	21	February 5, 2026
Finetuning whisper attention mask not set and canot be inferred 🤗Transformers	5	6212	February 4, 2026