Yufan Zhuang

CSE@UC San Diego

prof_pic.jpg

Anywhere on campus :)

9500 Gilman Drive

La Jolla, California

Hey, I’m Yufan Zhuang (庄宇凡)!

I am a PhD student in UCSD’s Computer Science & Engineering department advised by Jingbo Shang. My research is centered on natural language processing and large language models.

I have proposed ways to in-context learn from any modality (Vector-ICL), conducted studies on cross-lingual data contamination for contemporary LLMs, made transformers effective meta algorithm learners for decision trees (MetaTree).

I’ve also trained first series of high quality Mamba-based vision language models (ViperVLMs), extending LLMs context length with adaptive wavelet transform (WavSpa), and a series of work on computational sociology research.

Prior to my PhD study, I worked at IBM T. J. Watson Research Center as a research engineer helping to enhance software engineering with the power of AI and vice versa. I received my MS in Data Science from Columbia, my BSc in Applied Math Minor in CS (with First Class Honors) from Hong Kong Polytechnic University.

news

Jan 28, 2025 Our paper Vector-ICL has been accepted at ICLR 2025!
Sep 01, 2024 Our paper Data Contamination Can Cross Language Barriers has been published at EMNLP 2024!
Aug 30, 2024 Our paper Learning a Decision Tree Algorithm with Transformers has been published at Transactions on Machine Learning Research!
Jun 19, 2023 I’ve started my summer internship at the Deep Learning group of Microsoft Research (Redmond, Washington), mentored by Chandan Singh and Liyuan Liu!
May 12, 2023 Our paper “Incorporating Signal Awareness in Source Code Modeling: an Application to Vulnerability Detection” has been published at ACM Transactions on Software Engineering and Methodology!

selected publications

  1. ICLR
    Vector-ICL: In-context Learning with Continuous Vector Representations
    Yufan Zhuang, Chandan Singh, Liyuan Liu, and 2 more authors
    International Conference on Learning Representations, 2024
  2. EMNLP
    Data Contamination Can Cross Language Barriers
    Feng Yao*Yufan Zhuang*, Zihao Sun, and 3 more authors
    Empirical Methods in Natural Language Processing, 2024
  3. TMLR
    Learning a Decision Tree Algorithm with Transformers
    Yufan Zhuang, Liyuan Liu, Chandan Singh, and 2 more authors
    Transactions on Machine Learning Research, 2024
  4. UniReps@NeurIPS
    WavSpA: Wavelet Space Attention for Boosting Transformers’ Long Sequence Learning Ability
    Yufan Zhuang, Zihan Wang, Fangbo Tao, and 1 more author
    NeurIPS 1st UniReps Workshop, 2023
  5. TOSEM
    Incorporating Signal Awareness in Source Code Modeling: an Application to Vulnerability Detection
    Sahil Suneja, Yufan Zhuang, Yunhui Zheng, and 3 more authors
    ACM Transactions on Software Engineering and Methodology, 2023
  6. AAG
    Sleeping Lion or Sick Man? Machine Learning Approaches to Deciphering Heterogeneous Images of Chinese in North America
    Qiang Fu, Yufan Zhuang, Yushu Zhu, and 1 more author
    Annals of the American Association of Geographers, 2022
  7. FSE
    Probing model signal-awareness via prediction-preserving input minimization
    Sahil Suneja*, Yunhui Zheng*Yufan Zhuang*, and 2 more authors
    ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021
  8. BDR
    Agreeing to disagree: choosing among eight topic-modeling methods
    Qiang Fu, Yufan Zhuang, Jiaxin Gu, and 2 more authors
    Big Data Research, 2021