Difan Jiao's picture

Difan Jiao

difanjiao

·

difanj0713

AI & ML interests

Generative Models & Mech Interp

Recent Activity

submitted a paper 3 days ago

LLM Safety From Within: Detecting Harmful Content with Internal Representations

upvoted a paper 4 days ago

LLM Safety From Within: Detecting Harmful Content with Internal Representations

updated a model 4 days ago

UofTCSSLab/SIREN-Llama-3.1-8B

View all activity

Organizations

Collections 1

Papers 3

arxiv:2604.18519

arxiv:2604.01591

arxiv:2508.18179

models 6

difanjiao/vanilla_grpo_math_Qwen3-4B

4B • Updated 19 days ago • 19

difanjiao/ThinkTwice-Olmo3-7B-Instruct

7B • Updated 20 days ago • 37

difanjiao/ThinkTwice-Qwen3-4B-Instruct

4B • Updated 20 days ago • 77 • 2

difanjiao/OpenCharacter-Qwen3-4B-Instruct-2507

Updated Mar 20 • 5

difanjiao/chessllm-qwen3-0.6b-sft-only

0.6B • Updated Feb 12 • 3

difanjiao/chessllm-qwen3-0.6b

0.8B • Updated Feb 12 • 3

datasets 1

difanjiao/ExploreToM-80k

Viewer • Updated Feb 12 • 48.8k • 15