Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step1040 2B • Updated 1 day ago • 11
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step1000 2B • Updated 1 day ago • 10
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step960 2B • Updated 1 day ago • 10
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step920 2B • Updated 1 day ago • 8
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step880 2B • Updated 1 day ago • 7
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step840 2B • Updated 1 day ago • 8
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step800 2B • Updated 1 day ago • 8
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step760 2B • Updated 1 day ago • 5
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step720 2B • Updated 1 day ago • 8
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step680 2B • Updated 1 day ago • 12
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step640 2B • Updated 1 day ago • 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step600 2B • Updated 1 day ago • 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step560 2B • Updated 1 day ago • 9
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step520 2B • Updated 2 days ago • 7
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step480 2B • Updated 2 days ago • 7
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step440 2B • Updated 2 days ago • 7
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step400 2B • Updated 2 days ago • 6
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step360 2B • Updated 2 days ago • 6
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step320 2B • Updated 2 days ago • 7
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step280 2B • Updated 2 days ago • 7
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step240 2B • Updated 2 days ago • 7
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step200 2B • Updated 2 days ago • 8
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step160 2B • Updated 2 days ago • 8
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step120 2B • Updated 2 days ago • 6
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step80 2B • Updated 2 days ago • 7
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_step40 Updated 2 days ago • 7
Chenlu123/ablation_token_inistd1e-6_clip_0.2_0.26_c10.0_alpha0.0_beta0.0_lr3e-4_n8_layers_all_batch_data Updated 2 days ago
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step720 2B • Updated 7 days ago • 12
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step700 2B • Updated 7 days ago • 12
Chenlu123/inistd1e-5_clip_0.2_0.26_c10.0_lr5e-4_qwen_qwen2_5_math_1_5b_n8_layers_all_step680 2B • Updated 7 days ago • 13