Yi Liu - Technical Manager, PaddlePaddle

About Me

I am a Technical Manager at Baidu, leading the large model team at PaddlePaddle. We work across the full AI stack for large language models—from infrastructure and development tools to algorithms and applications—to power the ERNIE large model series, including ERNIE 3.5, 4.0, 4.5, and 5.0.

I have led over 10 open-source projects at Baidu that have collectively earned 150,000+ GitHub stars, such as PaddleOCR, ERNIE, PaddleFormers, and PaddleX, among others. These projects power critical AI applications worldwide—from multilingual document understanding to large language model training and deployment at scale.

My technical expertise spans computer vision, vision-language models, large language models, and autonomous driving. I hold a Ph.D. from Nanyang Technological University, Singapore (2017) and a B.Eng. from Harbin Institute of Technology, China (2013). Before joining Baidu in 2018, I was a Data Scientist at HP Labs Singapore. At Baidu, I have also collaborated extensively with the Apollo team on autonomous driving technologies, contributing to Robotaxi and AD 2.0—an experience I deeply value.

🔥

We're Hiring!

We are always looking for talented interns and full-time engineers to join our team in areas including: Large Language Models, Reinforcement Learning, Multimodal Algorithms, Computer Vision, and LLM Agent Applications. If you are interested, please feel free to send your resume to my email.

News

Nov 2025 We open-sourced ERNIE-4.5-VL-28B-A3B-Thinking: a lightweight VLM (3B active) matching flagship models, excelling in visual reasoning, STEM solving, visual grounding, and video understanding with tool utilization.
Oct 2025 We open-sourced PaddleOCR-VL: a SOTA 0.9B VLM for document parsing, supporting 109 languages and recognizing complex elements (text, tables, formulas, charts).
Sep 2025 We open-sourced ERNIE-4.5-21B-A3B-Thinking: an enhanced reasoning model with improved performance on logical reasoning, mathematics, science, coding, featuring tool usage and 128K long-context.
Aug 2025 I gave a talk at IJCAI Conference on "PaddlePaddle Deep Learning Framework and its Support for Large Model Training and Inference".
Jun 2025 We open-sourced ERNIE 4.5: a family of large-scale multimodal models with 10 variants, including MoE models (47B/3B active, 424B total) and dense models (0.3B).
May 2025 We open-sourced PaddleOCR 3.0: featuring PP-OCRv5 (high-accuracy text recognition), PP-StructureV3 (general-purpose document parsing), and PP-ChatOCRv4 (intelligent document understanding).

Key Open-Source Projects

PaddleOCR Active

62.9k

A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages with ultra-lightweight system, powering popular projects like Umi-OCR, OmniParser, MinerU, and RAGFlow.

PaddleFormers Active

12.9k

An easy-to-use library of pre-trained large language models based on PaddlePaddle. Implements 4D parallel strategies through unified Trainer API, supporting SFT/DPO paradigms and integrating PEFT, MergeKit, and Quantization APIs for efficient LLM development.

ERNIE Active

7.5k

Official repository for ERNIE 4.5, featuring both MoE and Dense models across LLMs and multimodal architectures. Provides end-to-end development pipeline for training, compression, and inference, supporting full-cycle industrial deployment.

PaddleX Active

5.9k

All-in-One low-code development tool for AI models built on PaddlePaddle. Integrates over 200 ready-to-use pre-trained models covering OCR, object detection, and time series forecasting, supporting complete workflow from training to deployment.

PaddleDetection

13.1k

End-to-end object detection toolkit providing 30+ algorithms and 250+ pre-trained models. Supports object detection, instance segmentation, keypoint detection, and multiple object tracking with complete pipeline from development to deployment.

PaddleNLP

12.3k

Easy-to-use NLP library with 45+ architectures and 500+ pretrained models. Supports wide-range of tasks from research to industrial applications including Neural Search, Question Answering, Information Extraction, and Sentiment Analysis.

PaddleSeg

9.2k

Easy-to-use image segmentation library providing 45+ models and 150+ pre-trained models. Supports Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image Matting, and 3D Segmentation with complete flow from labeling to deployment.

PaddleClas

5.8k

Comprehensive toolkit for image classification and recognition. Encompasses advanced algorithms including PP-HGNet, PP-LCNetv2, and PP-LCNet, providing 35 series with 164 ImageNet pre-trained models for industrial and academic applications.

Publications

arXiv Papers

[A18] PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model 2025

Cheng Cui, Ting Sun, Suyin Liang, Tingquan Gao, Zelun Zhang, Jiaxuan Liu, Xueqing Wang, Changda Zhou, Hongen Liu, Manhui Lin, Yue Zhang, Yubo Zhang, Handong Zheng, Jing Zhang, Jun Zhang, Yi Liu, Dianhai Yu, Yanjun Ma

[A17] PaddleOCR 3.0 Technical Report 2025

Cheng Cui, Ting Sun, Manhui Lin, Tingquan Gao, Yubo Zhang, Jiaxuan Liu, Xueqing Wang, Zelun Zhang, Changda Zhou, Hongen Liu, Yue Zhang, Wenyu Lv, Kui Huang, Yichao Zhang, Jing Zhang, Jun Zhang, Yi Liu, Dianhai Yu, Yanjun Ma

[A16] Enabling Versatile Controls for Video Diffusion Models 2025

Xu Zhang, Hao Zhou, Haoming Qin, Xiaobin Lu, Jiaxing Yan, Guanzhong Wang, Zeyu Chen, Yi Liu

[A15] PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding 2025

Kui Huang, Xinrong Chen, Wenyu Lv, Jincheng Liao, Guanzhong Wang, Yi Liu

[A14] PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks 2025

Feng Ni, Kui Huang, Yao Lu, Wenyu Lv, Guanzhong Wang, Zeyu Chen, Yi Liu

[A13] PP-FormulaNet: Bridging Accuracy and Efficiency in Advanced Formula Recognition 2025

Hongen Liu, Cheng Cui, Yuning Du, Yi Liu, Gang Pan

[A12] PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction 2025

Ting Sun, Cheng Cui, Yuning Du, Yi Liu

[A11] LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model 2024

Hongen Liu, Di Sun, Jiahao Wang, Yi Liu, Gang Pan

[A10] RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer 2024

Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu

[A9] DETRs Beat YOLOs on Real-time Object Detection 2023

Wenyu Lv, Shangliang Xu, Yian Zhao, Guanzhong Wang, Jinman Wei, Cheng Cui, Yuning Du, Qingqing Dang, Yi Liu

[A8] PP-MobileSeg: Explore the Fast and Accurate Semantic Segmentation Model on Mobile Devices 2023

Shiyu Tang, Ting Sun, Juncai Peng, Guowei Chen, Yuying Hao, Manhui Lin, Zhihong Xiao, Jiangbin You, Yi Liu

[A7] PP-YOLOE-R: An Efficient Anchor-Free Rotated Object Detector 2022

Xinxin Wang, Guanzhong Wang, Qingqing Dang, Yi Liu, Xiaoguang Hu, Dianhai Yu

[A6] RAIS: Robust and Accurate Interactive Segmentation via Continual Learning 2022

Yuying Hao, Yi Liu, Juncai Peng, Haoyi Xiong, Guowei Chen, Shiyu Tang, Zeyu Chen, Baohua Lai

[A5] PP-StructureV2: A Stronger Document Analysis System 2022

Chenxia Li, Ruoyu Guo, Jun Zhou, Mengtao An, Yuning Du, Lingfeng Zhu, Yi Liu, Xiaoguang Hu, Dianhai Yu

[A4] EISeg: An Efficient Interactive Segmentation Tool based on PaddlePaddle 2022

Yuying Hao, Yi Liu, Yizhou Chen, Lin Han, Juncai Peng, Shiyu Tang, Guowei Chen, Zewu Wu, Zeyu Chen, Baohua Lai

[A3] PP-Matting: High-Accuracy Natural Image Matting 2022

Guowei Chen, Yi Liu, Jian Wang, Juncai Peng, Yuying Hao, Lutao Chu, Shiyu Tang, Zewu Wu, Zeyu Chen, Zhiliang Yu, Yuning Du, Qingqing Dang, Xiaoguang Hu, Dianhai Yu

[A2] PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model 2022

Juncai Peng, Yi Liu, Shiyu Tang, Yuying Hao, Lutao Chu, Guowei Chen, Zewu Wu, Zeyu Chen, Zhiliang Yu, Yuning Du, Qingqing Dang, Baohua Lai, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma

[A1] PaddleSeg: A High-Efficient Development Toolkit for Image Segmentation 2021

Yi Liu, Lutao Chu, Guowei Chen, Zewu Wu, Zeyu Chen, Baohua Lai, Yuying Hao

Conference Papers

[C20] Sortblock: Similarity-Aware Feature Reuse for Diffusion Model AAAI 2026

Hanqi Chen, Xu Zhang, Xiaoliu Guan, Lielin Jiang, Guanzhong Wang, Zeyu Chen, Yi Liu

[C19] SUTrack: Towards Simple and Unified Single Object Tracking AAAI 2025

Xin Chen, Ben Kang, Wanting Geng, Jiawen Zhu, Yi Liu, Dong Wang, Huchuan Lu

[C18] Exploring Enhanced Contextual Information for Video-Level Object Tracking AAAI 2025

Ben Kang, Xin Chen, Simiao Lai, Yang Liu, Yi Liu, Dong Wang

[C17] Distribution-Aware Continual Test-Time Adaptation for Semantic Segmentation ICRA 2024

Jiayi Ni, Senqiao Yang, Ran Xu, Jiaming Liu, Xiaoqi Li, Zehui Chen, Ruoxi Qin, Mingliang Xu, Yi Liu

[C16] DETRs Beat YOLOs on Real-time Object Detection CVPR 2024

Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen

[C15] Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-Based Video Classification Frameworks ISBI 2023

Yuxuan Zhang, Qingzhong Wang, Jiang Bian, Yi Liu, Yanwu Xu, Dejing Dou, Haoyi Xiong

[C14] Context Matters: Cross-Domain Cell Detection in Histopathology Images via Contextual Regularization MIUA 2023

Ziqi Wen, Qingzhong Wang, Jiang Bian, Xuhong Li, Yi Liu, Haoyi Xiong

[C13] Lightweight Image Super-Resolution with Superpixel Token Interaction ICCV 2023

Aiping Zhang, Wenqi Ren, Yi Liu, Xiaochun Cao

[C12] Towards Efficient 3D Human Motion Prediction using Deformable Transformer-based Adversarial Network ICRA 2022

Hua Yu, Xuanzhe Fan, Yaqing Hou, Yi Liu, Cai Kang, Dongsheng Zhou, Qiang Zhang

[C11] MUSCLE: Multi-task Self-supervised Continual Learning to Pre-train Deep Models for X-Ray Images of Multiple Body Parts MICCAI 2022

Weibin Liao, Haoyi Xiong, Qingzhong Wang, Yan Mo, Xuhong Li, Yi Liu, Zeyu Chen, Siyu Huang, Dejing Dou

[C10] PP-HumanSeg: Connectivity-Aware Portrait Segmentation With a Large-Scale Teleconferencing Video Dataset WACV 2022

Lutao Chu, Yi Liu, Zewu Wu, Shiyu Tang, Guowei Chen, Yuying Hao, Juncai Peng, Zhiliang Yu, Zeyu Chen, Baohua Lai, Haoyi Xiong

[C9] EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow ICCV 2021

Yuying Hao, Yi Liu, Zewu Wu, Lin Han, Yizhou Chen, Guowei Chen, Lutao Chu, Shiyu Tang, Zhiliang Yu, Zeyu Chen, Baohua Lai

[C8] A New Reconstruction Method in Gaze Estimation with Natural Head Movement MVA 2017

Yi Liu, Bu-Sung Lee, Andrzej Sluzek, Deepu Rajan, Martin J. McKeown

[C7] Feasibility Analysis of Eye Typing with a Standard Webcam ECCV Workshop 2016

Yi Liu, Bu-Sung Lee, Andrzej Sluzek, Deepu Rajan, Martin J. McKeown

[C6] GazeTry: Swipe Text Typing Using Gaze OzCHI 2015

Yi Liu, Chi Zhang, Chonho Lee, Bu-Sung Lee, Alex Q. Chen

[C5] A Robust Recognition Approach in Eye-Based Dwell-Free Typing IEEE PIC 2015

Yi Liu, Bu-Sung Lee, Martin J. McKeown, Chonho Lee

[C4] Feasibility Analysis and Adaptive Thresholding for Mobile Applications Controlled by EEG Signals EUSIPCO 2015

Chonho Lee, Jiawei Chin, Yi Liu, Bu-Sung Lee, Martin J. McKeown

[C3] A Wavelet Entropy-Based Change Point Detection on Network Traffic: A Case Study of Heartbleed Vulnerability IEEE CCTA 2014

Chonho Lee, Yi Liu, Lim Hui Tan, Wei Goh, Bu-Sung Lee, Chai Kiat Yeo

[C2] A Motion Accuracy Evaluator Based on Body Parts Movement by MapReduce Video Processing IEEE BHI 2014

Chonho Lee, Yoshihiro Terada, Yi Liu, Bu-Sung Lee

[C1] Analysis of Visually Guided Tracking Performance in Parkinson's Disease IEEE e-Health 2014

Yi Liu, Chonho Lee, Bu-Sung Lee, John Keith Robert Stevenson, Martin J. McKeown

Journal Papers

[J7] DSDC-GCN: Decoupled Static-Dynamic Co-Occurrence Graph Convolutional Networks for Skeleton-Based Action Recognition IEEE Transactions on Circuits and Systems for Video Technology 2025

Tianming Zhuang, Zhen Qin, Yi Ding, Zhiguang Qin, Ji Geng, Yi Liu, Kim-Kwang Raymond Choo

[J6] EGAvatar: Efficient GAN Inversion for Generalizable Head Avatar From Few-Shot Images IEEE Transactions on Visualization and Computer Graphics 2025

Hao-Pan Ren, Wei Duan, Wan-Yu Li, Yi Liu, Yu-Dong Guo, Shi-Sheng Huang, Ju-Yong Zhang, Hua Huang

[J5] Forecasting When to Forecast: Accelerating Diffusion Models with Confidence-Gated Taylor Knowledge-Based Systems 2025

Xiaoliu Guan, Lielin Jiang, Hanqi Chen, Xu Zhang, Jiaxing Yan, Guanzhong Wang, Yi Liu, Zetao Zhang, Yu Wu

[J4] MTPret: Improving X-Ray Image Analytics With Multitask Pretraining IEEE Transactions on Artificial Intelligence 2024

Weibin Liao, Haoyi Xiong, Qingzhong Wang, Yi Liu, Zeyu Chen, Qinghua Zheng, Dejing Dou

[J3] Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training of Image Segmentation Models Machine Learning 2023

Xuhong Li, Haoyi Xiong, Yi Liu, Dingfu Zhou, Zeyu Chen, Yaqing Wang, Dejing Dou

[J2] CamType: Assistive Text Entry Using Gaze with an Off-the-Shelf Webcam Machine Vision and Applications 2019

Yi Liu, Bu-Sung Lee, Deepu Rajan, Andrzej Sluzek, Martin J. McKeown

[J1] Robust Eye-Based Dwell-Free Typing International Journal of Human–Computer Interaction 2016

Yi Liu, Bu-Sung Lee, Martin J. McKeown