Peng Qi

portrait.jpeg

齐鹏

(pinyin: /qí péng/; ipa: /tɕʰǐ pʰə̌ŋ/)

I am an AI researcher working on AI agents, natural language processing, machine learning, and multimodal foundation models. I am currently leading research efforts at Uniphore, which I joined through its acquisition of Orby AI.

My research is driven by the goal of bringing the world’s knowledge to the user’s assistance, which manifests itself in the following directions

  • How to effectively organize and use knowledge. This involves tasks like question answering (where I have co-lead the development of some benchmarks for complex reasoning: HotpotQA and BeerQA), information extraction, syntactic analysis for many languages (check out Stanza), etc.
  • How to effectively communicate knowledge. This mainly concerns interactive NLP systems such as conversational systems, where I am interested in theory-of-mind reasoning under information asymmetry (e.g., how to ask good questions and how to provide good answers beyond the literal answer), offline-to-online transfer, multi-modal interactions, etc. On the application side, I co-lead the founding research team that launched Amazon Q at Amazon.
  • How to leverage interactive knowledge to help users better perform tasks. This mainly concerns multimodal digital agents operating on real-world user interfaces and solving problems on behalf of users. Want to learn more? Consider joining our research team at Uniphore!

In all of these directions, I am also excited to explore data-efficient models and training techniques, model and system explainability, and self-supervised learning techniques that enable us to address these problems.

Before joining Uniphore / Orby, I worked for Amazon Web Services (AWS) as an senior applied scientist, and JD.com AI Research as a senior research scientist before that. I obtained my Ph.D. in Computer Science at Stanford University advised by Prof. Chris Manning, where I was a member of the NLP group and AI Lab. I also obtained two Master’s at Stanford (CS & Statistics), and my Bachelor’s at Tsinghua University.

[CV (slightly outdated)]

latest posts

selected publications

  1. arXiv
    Abductive Preference Learning
    Yijin Ni, and  Peng Qi
    arXiv preprint arXiv:2510.09887, 2025
  2. arXiv
    WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions
    Sanjari Srivastava,  Gang Li,  Cheng Chang,  Rishu Garg,  Manpreet Kaur,  Charlene Y Lee,  Yuezhang Li,  Yining Mao,  Ignacio Cases,  Yanan Xie, and  others
    arXiv preprint arXiv:2510.09872, 2025
  3. arXiv
    AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning
    Mengzhao Jia,  Zhihan Zhang,  Ignacio Cases,  Zheyuan Liu,  Meng Jiang, and  Peng Qi
    arXiv preprint arXiv:2510.14738, 2025
  4. arXiv
    PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction
    Simon Yu,  Gang Li,  Weiyan Shi, and  Peng Qi
    arXiv preprint arXiv:2510.15863, 2025
  5. ACL
    CiteEval: Principle-Driven Citation Evaluation for Source Attribution
    Yumo Xu,  Peng Qi*,  Jifan Chen*,  Kunlun Liu,  Rujun Han,  Lan Liu,  Bonan Min,  Vittorio Castelli,  Arshit Gupta, and  Zhiguo Wang
    In Association of Computational Linguistics (ACL), 2025
  6. arXiv
    Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
    Yu Gu,  Boyuan Zheng,  Boyu Gou,  Kai Zhang,  Cheng Chang,  Sanjari Srivastava,  Yanan Xie,  Peng Qi,  Huan Sun, and  Yu Su
    2024
  7. EMNLP
    RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering
    Rujun Han,  Yuhao Zhang,  Peng Qi,  Yumo Xu,  Wang Jenyuan,  Lan Liu,  William Yang Wang,  Bonan Min, and  Vittorio Castelli
    In Empirical Methods in Natural Language Processing (EMNLP), 2024
  8. EMNLP
    Answering Open-Domain Questions of Varying Reasoning Steps from Text
    Peng Qi*,  Haejun Lee*,  Oghenetegiri "TG" Sido*, and  Christopher D. Manning
    In Empirical Methods for Natural Language Processing (EMNLP), 2021
  9. ACL (Demo)
    Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
    Peng Qi*,  Yuhao Zhang*,  Yuhui Zhang,  Jason Bolton, and  Christopher D. Manning
    In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020
  10. EMNLP
    HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
    Zhilin Yang*Peng Qi*,  Saizheng Zhang*,  Yoshua Bengio,  William W. Cohen,  Ruslan Salakhutdinov, and  Christopher D. Manning
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018