Justin (Zhaocong) Yuan

I obtained my MASc degree in RL and Robotics from University of Toronto, advised by Angela Schoellig at Dynamic Systems Lab (DSL), also part of Vector Institute and UofT Robotics Institute. Previously, I received my BASc degree in Engineering Science from UofT. I was also an intern in Nvidia Toronto AI lab advised by Sanja Fidler, Apple Siri team based in Seattle, and Data-Driven Decision Making Lab advised by Scott Sanner. My prior works largely focus on safe RL, transfer learning, and Sim-to-Real tasks.

I currently work at Qualcomm and my research of focus spans across small language models (SLM), multimodal foundation models (vision & speech), LLM inference and optimization (e.g. speculative decoding, on-device acceleration), and more recently diffusion large language models (dLLM).

In my leisure time, I play fingerstyle guitar and badminton .

news

Sep 24, 2025	Paper “OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding “ accepted at NeurIPS 2025. [link]
Sep 25, 2024	Paper “Stepping Forward on the Last Mile” accepted at NeurIPS 2024. [link]
Feb 21, 2023	Join Qualcomm as a Machine Learning Research Engineer on the Embedded AI team.
Nov 10, 2022	Graduate from University of Toronto, MASc
Oct 23, 2022	IROS 2022 presentation of our safe-control-gym paper. [link]
May 27, 2022	ICRA 2022 Workshop on Releasing Robots into the Wild: Simulations, Benchmarks, and Deployment (organizer). website/videos
Mar 15, 2022	Vector Institute Industry Workshop (guest speaker).
May 04, 2020	Graduate from University of Toronto, BASc, Engineering Science
Sep 03, 2018	Intern at Nvidia, Toronto, deep learning research on Sim-to-Real transfer and self-driving
May 27, 2018	Intern at Apple, Seattle, on the Siri NLU team

latest posts

Oct 16, 2024	Vector Quantization - A Quick Dive

selected publications

NeurIPS

OmniDraft: A Cross-vocabulary, Online Adaptive Drafter for On-device Speculative Decoding

Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Shaojie Zhuo, Chen Feng, Yicheng Lin, Chenzheng Su, and Xiaopeng Zhang

arXiv preprint arXiv:2507.02659, 2025

Abstract PDF

Speculative decoding generally dictates having a small, efficient draft model that is either pretrained or distilled offline to a particular target model series, for instance, Llama or Qwen models. However, within online deployment settings, there are two major challenges: 1) usage of a target model that is incompatible with the draft model; 2) expectation of latency improvements over usage and time. In this work, we propose OmniDraft, a unified framework that enables a single draft model to operate with any target model and adapt dynamically to user data. We introduce an online n-gram cache with hybrid distillation fine-tuning to address the cross-vocabulary mismatch across draft and target models; and further improve decoding speed by leveraging adaptive drafting techniques. OmniDraft is particularly suitable for on-device LLM applications where model cost, efficiency and user customization are the major points of contention. This further highlights the need to tackle the above challenges and motivates the \textit“one drafter for all” paradigm. We showcase the proficiency of the OmniDraft framework by performing online learning on math reasoning, coding and text generation tasks. Notably, OmniDraft enables a single Llama-68M model to pair with various target models including Vicuna-7B, Qwen2-7B and Llama3-8B models for speculative decoding; and additionally provides up to 1.5-2x speedup.
arXiv

Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models

Chen Feng, Yicheng Lin, Shaojie Zhuo, Chenzheng Su, Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, and Xiaopeng Zhang

arXiv preprint arXiv:2507.07877, 2025

Abstract PDF

Recent advances in Automatic Speech Recognition (ASR) have demonstrated remarkable accuracy and robustness in diverse audio applications, such as live transcription and voice command processing. However, deploying these models on resource-constrained edge devices (e.g., IoT device, wearables) still presents substantial challenges due to strict limits on memory, compute and power. Quantization, particularly Post-Training Quantization (PTQ), offers an effective way to reduce model size and inference cost without retraining. Despite its importance, the performance implications of various advanced quantization methods and bit-width configurations on ASR models remain unclear. In this work, we present a comprehensive benchmark of eight state-of-the-art (SOTA) PTQ methods applied to two leading edge-ASR model families, Whisper and Moonshine. We systematically evaluate model performances (i.e., accuracy, memory I/O and bit operations) across seven diverse datasets from the open ASR leader-board, analyzing the impact of quantization and various configurations on both weights and activations. Built on an extension of the LLM compression toolkit, our framework integrates edge-ASR models, diverse advanced quantization algorithms, a unified calibration and evaluation data pipeline, with detailed analysis tools. Our results characterize the trade-offs between efficiency and accuracy, demonstrating that even 3-bit quantization can succeed on high capacity models when using advanced PTQ techniques. These findings provide valuable insights for optimizing ASR models on low-power, always-on edge devices.
NeurIPS

Stepping forward on the last mile

Chen Feng , Jay Zhuo , Parker Zhang, Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, and Andrew Zou Li

Advances in Neural Information Processing Systems, 2024

Abstract PDF

Continuously adapting pre-trained models to local data on resource constrained edge devices is the \emphlast mile for model deployment. However, as models increase in size and depth, backpropagation requires a large amount of memory, which becomes prohibitive for edge devices. In addition, most existing low power neural processing engines (e.g., NPUs, DSPs, MCUs, etc.) are designed as fixed-point inference accelerators, without training capabilities. Forward gradients, solely based on directional derivatives computed from two forward calls, have been recently used for model training, with substantial savings in computation and memory. However, the performance of quantized training with fixed-point forward gradients remains unclear. In this paper, we investigate the feasibility of on-device training using fixed-point forward gradients, by conducting comprehensive experiments across a variety of deep learning benchmark tasks in both vision and audio domains. We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation. An empirical study on how training with forward gradients navigates in the loss landscape is further explored. Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.
IROS

safe-Control-Gym: A Unified Benchmark Suite for Safe Learning-Based Control and Reinforcement Learning in Robotics

Zhaocong Yuan, Adam W Hall, Siqi Zhou, Lukas Brunke, Melissa Greeff, Jacopo Panerati, and Angela P Schoellig

IEEE Robotics and Automation Letters, 2022

Abstract PDF Code

In recent years, both reinforcement learning and learning-based control—as well as the study of their safety, which is crucial for deployment in real-world robots—have gained significant traction. However, to adequately gauge the progress and applicability of new results, we need the tools to equitably compare the approaches proposed by the controls and reinforcement learning communities. Here, we propose a new open-source benchmark suite, called safe-control-gym, supporting both model-based and databased control techniques. We provide implementations for three dynamic systems—the cart-pole, the 1D, and 2D quadrotor— and two control tasks—stabilization and trajectory tracking. We propose to extend OpenAI’s Gym API—the de facto standard in reinforcement learning research—with (i) the ability to specify (and query) symbolic dynamics and (ii) constraints, and (iii) (repeatably) inject simulated disturbances in the control inputs, state measurements, and inertial properties. To demonstrate our proposal and in an attempt to bring research communities closer together, we show how to use safe-control-gym to quantitatively compare the control performance, data efficiency, and safety of multiple approaches from the fields of traditional control, learning-based control, and reinforcement learning.
Annual Reviews

Safe learning in robotics: From learning-based control to safe reinforcement learning

Lukas Brunke, Melissa Greeff, Adam W Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P Schoellig

Annual Review of Control, Robotics, and Autonomous Systems, 2022

Abstract PDF Code

The last half decade has seen a steep rise in the number of contributions on safe learning methods for real-world robotic deployments from both the control and reinforcement learning communities. This article provides a concise but holistic review of the recent advances made in using machine learning to achieve safe decision-making under uncertainties, with a focus on unifying the language and frameworks used in control theory and reinforcement learning research. It includes learning-based control approaches that safely improve performance by learning the uncertain dynamics, reinforcement learning approaches that encourage safety or robustness, and methods that can formally certify the safety of a learned control policy. As data- and learning-based robot control methods continue to gain traction, researchers must understand when and how to best leverage them in real-world scenarios where safety is imperative, such as when operating in close proximityto humans. We highlight some of the open challenges that will drive the field of robot learning in the coming years, and emphasize the need for realistic physics-based benchmarks to facilitate fair comparisons between control and reinforcement learning approaches.
ICCV

Meta-sim: Learning to generate synthetic datasets

Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, and Sanja Fidler

In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019

Awarded Abstract PDF Code Website

Oral

Training models to high-end performance requires availability of large labeled datasets, which are expensive to get. The goal of our work is to automatically synthesize labeled datasets that are relevant for a downstream task. We propose Meta-Sim, which learns a generative model of synthetic scenes, and obtain images as well as its corresponding ground-truth via a graphics engine. We parametrize our dataset generator with a neural network, which learns to modify attributes of scene graphs obtained from probabilistic scene grammars, so as to minimize the distribution gap between its rendered outputs and target data. If the real dataset comes with a small labeled validation set, we additionally aim to optimize a meta-objective, i.e. downstream task performance. Experiments show that the proposed method can greatly improve content generation quality over a human-engineered probabilistic scene grammar, both qualitatively and quantitatively as measured by performance on a downstream task.