链接

https://tongyi-agent.github.io/zh/blog/introducing-tongyi-deep-research/

https://github.com/Alibaba-NLP/DeepResearch

系统论文共11 篇

Image

一句话总结

WebWalker: Benchmarking LLMs in Web Traversal

https://arxiv.org/pdf/2501.07572

https://github.com/Alibaba-NLP/DeepResearch/tree/main/WebAgent/WebWalker

构建了一个多跳多源且单源探索度高的复杂网页遍历Benchmark(WebWalkerQA,包含4个场景680个query共涉1373个网页),并提出一个多智能体框架(WebWorker,ExplorerAgent+CrticAgent),多个主流LLM在此框架下在此Benchmark表现最高准确率小于40%

Image

https://huggingface.co/spaces/callanwu/WebWalkerQALeaderboard