MLAgent

I assisted in the development of an AI research agent capable of posing, running, and evaluating ML experiments autonomously. I was responsible for adding a search action to the agent, giving it the ability to gather information from arXiv papers to hypothesize new research ideas. We benchmarked the model using MLAgentBench, a research agent evaluations framework.