PatronusAI/Qwen3-4B-Instruct-2507-CE-152T-GPT41Tea-notR-L4-M-Ep1-1e-5-Q32-65536-2026Feb21
4B • Updated
• 23
LLM Evaluation
Benchmarking Reward Hack Detection in Code Environments via Contrastive Analysis
MEMTRACK: Evaluating Long-Term Memory and State Tracking in Multi-Platform Dynamic Agent Environments