About Me

I am a Technical Manager at Baidu, leading the large model team at PaddlePaddle. We work across the full AI stack for large language models—from infrastructure and development tools to algorithms and applications—to power the ERNIE large model series, including ERNIE 3.5, 4.0, 4.5, and 5.0.

I have led over 10 open-source projects at Baidu that have collectively earned 150,000+ GitHub stars, such as PaddleOCR, ERNIE, PaddleFormers, and PaddleX, among others. These projects power critical AI applications worldwide—from multilingual document understanding to large language model training and deployment at scale.

My technical expertise spans computer vision, vision-language models, large language models, and autonomous driving. I hold a Ph.D. from Nanyang Technological University, Singapore (2017) and a B.Eng. from Harbin Institute of Technology, China (2013). Before joining Baidu in 2018, I was a Data Scientist at HP Labs Singapore. At Baidu, I have also collaborated extensively with the Apollo team on autonomous driving technologies, contributing to Robotaxi and AD 2.0—an experience I deeply value.

🔥

We're Hiring!

We are always looking for talented interns and full-time engineers to join our team in areas including: Large Language Models, Reinforcement Learning, Multimodal Algorithms, Computer Vision, and LLM Agent Applications. If you are interested, please feel free to send your resume to my email.

News

  • Nov 2025 We open-sourced ERNIE-4.5-VL-28B-A3B-Thinking: a lightweight VLM (3B active) matching flagship models, excelling in visual reasoning, STEM solving, visual grounding, and video understanding with tool utilization.
  • Oct 2025 We open-sourced PaddleOCR-VL: a SOTA 0.9B VLM for document parsing, supporting 109 languages and recognizing complex elements (text, tables, formulas, charts).
  • Sep 2025 We open-sourced ERNIE-4.5-21B-A3B-Thinking: an enhanced reasoning model with improved performance on logical reasoning, mathematics, science, coding, featuring tool usage and 128K long-context.
  • Aug 2025 I gave a talk at IJCAI Conference on "PaddlePaddle Deep Learning Framework and its Support for Large Model Training and Inference".
  • Jun 2025 We open-sourced ERNIE 4.5: a family of large-scale multimodal models with 10 variants, including MoE models (47B/3B active, 424B total) and dense models (0.3B).
  • May 2025 We open-sourced PaddleOCR 3.0: featuring PP-OCRv5 (high-accuracy text recognition), PP-StructureV3 (general-purpose document parsing), and PP-ChatOCRv4 (intelligent document understanding).

Key Open-Source Projects

PaddleOCR Active

A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages with ultra-lightweight system, powering popular projects like Umi-OCR, OmniParser, MinerU, and RAGFlow.

PaddleFormers Active

An easy-to-use library of pre-trained large language models based on PaddlePaddle. Implements 4D parallel strategies through unified Trainer API, supporting SFT/DPO paradigms and integrating PEFT, MergeKit, and Quantization APIs for efficient LLM development.

ERNIE Active

Official repository for ERNIE 4.5, featuring both MoE and Dense models across LLMs and multimodal architectures. Provides end-to-end development pipeline for training, compression, and inference, supporting full-cycle industrial deployment.

PaddleX Active

All-in-One low-code development tool for AI models built on PaddlePaddle. Integrates over 200 ready-to-use pre-trained models covering OCR, object detection, and time series forecasting, supporting complete workflow from training to deployment.

PaddleDetection

End-to-end object detection toolkit providing 30+ algorithms and 250+ pre-trained models. Supports object detection, instance segmentation, keypoint detection, and multiple object tracking with complete pipeline from development to deployment.

PaddleNLP

Easy-to-use NLP library with 45+ architectures and 500+ pretrained models. Supports wide-range of tasks from research to industrial applications including Neural Search, Question Answering, Information Extraction, and Sentiment Analysis.

PaddleSeg

Easy-to-use image segmentation library providing 45+ models and 150+ pre-trained models. Supports Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image Matting, and 3D Segmentation with complete flow from labeling to deployment.

PaddleClas

Comprehensive toolkit for image classification and recognition. Encompasses advanced algorithms including PP-HGNet, PP-LCNetv2, and PP-LCNet, providing 35 series with 164 ImageNet pre-trained models for industrial and academic applications.

Publications