Yanbin Liu

School of Computing
Australian National University

Short bio: Yanbin Liu is a Research Fellow at School of Computing, Australian National University, working with Prof. Stephen Gould. Previously, he was a Research Fellow at UWA Center for Medical Research, University of Western Australia, working with Winthrop Prof. Mohammed Bennamoun and Prof. Girish Dwivedi. Before that, he was a Postdoc at AAII, University of Technology Sydney (UTS), working with Prof. Ling Chen. He obtained his PhD from the Australian Artificial Intelligence Institute (AAII), UTS, under the supervision of Prof. Yi Yang. Prior to that, he received his BE and MS degrees from Tianjin University, under the supervision of Prof. Yahong Han and Prof. Jianmin Jiang. His research is focused on machine learning and deep learning for computer vision problems, especially learning with limited labeled data (e.g. few-shot learning).


07 / 2023
I received an academic research grant from the Google Cloud Research Credits Program!
03 / 2023
We have one paper about Few-shot Image Classification accepted by IEEE Transactions on Neural Networks and Learning Systems (TNNLS)!
02 / 2023
We have one paper titled Aligning Step-by-Step Instructional Diagrams to Video Demonstrations accepted by CVPR'23! Congrats to Jiahao!
02 / 2023
We have one paper accepted by PAKDD'23! Congrats to Changlu!
09 / 2022
I started the Research Fellow position at Australian National University.
06 / 2022
Our paper about Feature-Robost Optimal Transport was accepted by ECML/PKDD'22.
09 / 2021
I started the Research Fellow position at University of Western Australia.
07 / 2021
Our paper about Multi-Domain Few-Shot Classification was accepted by ICCV'21.
06 / 2021
Our paper about Mutual Information Estimation was accepted by ECML/PKDD'21.
05 / 2021
I was selected the CVPR 2021 Outstanding Reviewer.
02 / 2021
I started the Postdoc position at University of Technology Sydney.
06 / 2020
Our paper about Semantic Correspondence was accepted by CVPR'20.
05 / 2019
I visited Statistical Modelling Team @ RIKEN AIP as a research intern.
12 / 2018
Our paper about Transductive Few-shot Learning was accepted by ICLR'19.
11 / 2018
Our paper about Online Feature Selection was accepted by AAAI'19.
01 / 2018
I joined AITRICS (South Korea) as a research intern.
05 / 2017
Our team ranked 6th (top 1%) at the Google Cloud & Youtube-8M Video Understanding Challenge.
03 / 2017
I enrolled as a PhD student at the University of Technology Sydney.

Research (selected papers)

See Google Scholar profile for a full list of publications.

Bilaterally-normalized Scale-consistent Sinkhorn Distance for Few-shot Image Classification
Yanbin Liu, Linchao Zhu, Xiaohan Wang, Makoto Yamada and Yi Yang
IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 2023.
  author      = {Liu, Yanbin and Zhu, Linchao and Wang, Xiaohan and Yamada, Makoto and Yang, Yi},
  journal     = {IEEE Transactions on Neural Networks and Learning Systems}, 
  title       = {Bilaterally-normalized Scale-consistent Sinkhorn Distance for Few-shot Image Classification}, 
  year        = {2023},
  doi         = {10.1109/TNNLS.2023.3262351}}            

Abstract—Few-shot image classification aims at exploring transferable features from base classes to recognize images of the unseen novel classes with only a few labelled images. Existing methods usually compare the support features and query features, which are implemented by either matching the global feature vectors or matching the local feature maps at the same position. However, the few labelled images fail to capture all the diverse context and intra-class variations, leading to mismatch issues for existing methods. On one hand, due to the misaligned position and cluttered background, existing methods suffer from the object mismatch issue (Fig. 1(a)). On the other hand, due to the scale inconsistency between images, existing methods suffer from the scale mismatch issue (Fig. 1(b)). In this paper, we propose the Bilaterally-normalized Scale-consistent Sinkhorn Distance (BSSD) to solve these issues. Firstly, instead of same-position matching, we utilize the Sinkhorn Distance to find an optimal matching between images, mitigating the object mismatch caused by misaligned position. Meanwhile, we propose the intra-image and inter-image attentions as the bilateral normalization on Sinkhorn Distance to suppress the object mismatch caused by background clutter. Secondly, local feature maps are enhanced with the multi-scale pooling strategy, making Sinkhorn Distance possible to find a consistent matching scale between images. Experimental results show the effectiveness of the proposed approach, and we achieve the state-of-the-art on three few-shot benchmarks.

Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
Jiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez and Stephen Gould
CVPR 2023.
        author    = {Zhang, Jiahao and Cherian, Anoop and Liu, Yanbin and Ben-Shabat, Yizhak and Rodriguez, Cristian and Gould, Stephen},
        title     = {Aligning Step-by-Step Instructional Diagrams to Video Demonstrations},
        booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
        year      = {2023},

Multimodal alignment facilitates the retrieval of in- stances from one modality when queried using another. In this paper, we consider a novel setting where such an align- ment is between (i) instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly man- uals) and (ii) segments from in-the-wild videos; these videos comprising an enactment of the assembly actions in the real world. We introduce a supervised contrastive learning ap- proach that learns to align videos with the subtle details of assembly diagrams, guided by a set of novel losses. To study this problem and evaluate the effectiveness of our method, we introduce a new dataset: IAW—for Ikea assembly in the wild—consisting of 183 hours of videos from diverse fur- niture assembly collections and nearly 8,300 illustrations from their associated instruction manuals and annotated for their ground truth alignments. We define two tasks on this dataset: First, nearest neighbor retrieval between video segments and illustrations, and, second, alignment of in- struction steps and the segments for each video. Extensive experiments on IAW demonstrate superior performance of our approach against alternatives.

A Multi-Mode Modulator for Multi-Domain Few-Shot Classification
Yanbin Liu, Juho Lee, Linchao Zhu, Ling Chen, Humphrey Shi and Yi Yang
ICCV 2021.
  title     = {A Multi-Mode Modulator for Multi-Domain Few-Shot Classification},
  author    = {Liu, Yanbin and Lee, Juho and Zhu, Linchao and Chen, Ling and Shi, Humphrey and Yang, Yi},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2021},
  pages     = {8453-8462}

Most existing few-shot classification methods only consider generalization on one dataset (i.e., single-domain), failing to transfer across various seen and unseen domains. In this paper, we consider the more realistic multi-domain few-shot classification problem to investigate the cross-domain generalization. Two challenges exist in this new setting: (1) how to efficiently generate multi-domain feature representation, and (2) how to explore domain correlations for better cross-domain generalization. We propose a parameter-efficient multi-mode modulator to address both challenges. First, the modulator is designed to maintain multiple modulation parameters (one for each domain) in a single network, thus achieving single-network multi-domain representation. Given a particular domain, domain-aware features can be efficiently generated with the well-devised separative selection and cooperative query modules. Second, we further divide the modulation parameters into the domain-specific set and the domain-cooperative set to explore the intra-domain information and inter-domain correlations, respectively. The intra-domain information describes each domain independently to prevent negative interference. The inter-domain correlations guide information sharing among relevant domains to enrich their own representation. Moreover, unseen domains can utilize the correlations to obtain an adaptive combination of seen domains for extrapolation. We demonstrate that the proposed multi-mode modulator achieves state-of-the-art results on the challenging META-DATASET benchmark, especially for unseen test domains.

LSMI-Sinkhorn: Semi-supervised Mutual Information Estimation with Optimal Transport
Yanbin Liu*, Makoto Yamada*, Yao-Hung Hubert Tsai, Tam Le, Ruslan Salakhutdinov and Yi Yang
  title     = {LSMI-Sinkhorn: Semi-supervised Mutual Information Estimation with Optimal Transport},
  author    = {Liu, Yanbin and Yamada, Makoto and Tsai, Yao-Hung Hubert and Le, Tam and Salakhutdinov, Ruslan and Yang, Yi},
  booktitle = {ECML/PKDD},
  year      = {2021}

Estimating mutual information is an important machine learning and statistics problem. To estimate the mutual information from data, a common practice is preparing a set of paired samples $\{(\boldx_i,\boldy_i)\}_{i = 1}^n \iid p(\boldx,\boldy)$. However, in many situations, it is difficult to obtain a large number of data pairs. To address this problem, we propose the semi-supervised Squared-loss Mutual Information (SMI) estimation method using a small number of paired samples and the available unpaired ones. We first represent SMI through the density ratio function, where the expectation is approximated by the samples from marginals and its assignment parameters. The objective is formulated using the optimal transport problem and quadratic programming. Then, we introduce the Least-Squares Mutual Information with Sinkhorn (LSMI-Sinkhorn) algorithm for efficient optimization. Through experiments, we first demonstrate that the proposed method can estimate the SMI without a large number of paired samples. Then, we evaluate and show the effectiveness of the proposed LSMI-Sinkhorn algorithm on various types of machine learning problems such as image matching and photo album summarization.

Semantic Correspondence as an Optimal Transport Problem
Yanbin Liu, Linchao Zhu, Makoto Yamada and Yi Yang
CVPR 2020.
  title     = {Semantic Correspondence as an Optimal Transport Problem},
  author    = {Liu, Yanbin and Zhu, Linchao and Yamada, Makoto and Yang, Yi},
  booktitle = {CVPR},
  year      = {2020}

Establishing dense correspondences across semantically similar images is a challenging task. Due to the large intra-class variation and background clutter, two common issues occur in current approaches. First, many pixels in a source image are assigned to one target pixel, i.e., many to one matching. Second, some object pixels are assigned to the background pixels, i.e., background matching. We solve the first issue by global feature matching, which maximizes the total matching correlations between images to obtain a global optimal matching matrix. The row sum and column sum constraints are enforced on the matching matrix to induce a balanced solution, thus suppressing the many to one matching. We solve the second issue by applying a staircase function on the class activation maps to re-weight the importance of pixels into four levels from foreground to background. The whole procedure is combined into a unified optimal transport algorithm by converting the maximization problem to the optimal transport formulation and incorporating the staircase weights into optimal transport algorithm to act as empirical distributions. The proposed algorithm achieves state-of-the-art performance on four benchmark datasets. Notably, a 26\% relative improvement is achieved on the large-scale SPair-71k dataset.

Learning to Propagate Labels: Transductive Propagation Network for Few-shot Learning
Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sungju Hwang and Yi Yang
ICLR 2019.
  title     = {Learning to propagate labels: Transductive propagation network for few-shot learning},
  author    = {Liu, Yanbin and Lee, Juho and Park, Minseop and Kim, Saehoon and Yang, Eunho and Hwang, Sung Ju and Yang, Yi},
  booktitle = {ICLR},
  year      = {2019}

The goal of few-shot learning is to learn a classifier that generalizes well even when trained with a limited number of training instances per class. The recently introduced meta-learning approaches tackle this problem by learning a generic classifier across a large number of multiclass classification tasks and generalizing the model to a new task. Yet, even with such meta-learning, the low-data problem in the novel classification task still remains. In this paper, we propose Transductive Propagation Network (TPN), a novel meta-learning framework for transductive inference that classifies the entire test set at once to alleviate the low-data problem. Specifically, we propose to learn to propagate labels from labeled instances to unlabeled test instances, by learning a graph construction module that exploits the manifold structure in the data. TPN jointly learns both the parameters of feature embedding and the graph construction in an end-to-end manner. We validate TPN on multiple benchmark datasets, on which it largely outperforms existing few-shot learning approaches and achieves the state-of-the-art results.

Adaptive Sparse Confidence-Weighted Learning for Online Feature Selection
Yanbin Liu, Yan Yan, Ling Chen, Yahong Han and Yi Yang
AAAI 2019.
  title     = {Adaptive sparse confidence-weighted learning for online feature selection},
  author    = {Liu, Yanbin and Yan, Yan and Chen, Ling and Han, Yahong and Yang, Yi},
  booktitle = {AAAI},
  year      = {2019}

In this paper, we propose a new online feature selection algorithm for streaming data. We aim to focus on the following two problems which remain unaddressed in literature. First, most existing online feature selection algorithms merely utilize the first-order information of the data streams, regardless of the fact that second-order information explores the correlations between features and significantly improves the performance. Second, most online feature selection algorithms are based on the balanced data presumption, which is not true in many real-world applications. For example, in fraud detection, the number of positive examples are much less than negative examples because most cases are not fraud. The balanced assumption will make the selected features biased towards the majority class and fail to detect the fraud cases. We propose an Adaptive Sparse Confidence-Weighted (ASCW) algorithm to solve the aforementioned two problems. We first introduce an l0-norm constraint into the second-order confidence-weighted (CW) learning for feature selection. Then the original loss is substituted with a cost-sensitive loss function to address the imbalanced data issue. Furthermore, our algorithm maintains multiple sparse CW learner with the corresponding cost vector to dynamically select an optimal cost. We theoretically enhance the theory of sparse CW learning and analyze the performance behavior in F-measure. Empirical studies show the superior performance over the stateof-the-art online learning methods in the online-batch setting.

Pooling the Convolutional Layers in Deep ConvNets for Video Action Recognition*
Shichao Zhao, Yanbin Liu, Yahong Han, Richang Hong, Qinghua Hu and Qi Tian
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 2018.
  title   = {Pooling the convolutional layers in deep convnets for video action recognition},
  author  = {Zhao, Shichao and Liu, Yanbin and Han, Yahong and Hong, Richang and Hu, Qinghua and Tian, Qi},
  journal = {IEEE Transactions on Circuits and Systems for Video Technology},
  volume  = {28},
  number  = {8},
  pages   = {1839--1849},
  year    = {2021},

Deep ConvNets have shown their good performance in image classification tasks. However, there still remains problems in deep video representations for action recognition. On one hand, current video ConvNets are relatively shallow compared with image ConvNets, which limits their capability of capturing the complex video action information; on the other hand, temporal information of videos is not properly utilized to pool and encode the video sequences. Toward these issues, in this paper we utilize two state-of-the-art ConvNets, i.e., the very deep spatial net (VGGNet [1]) and the temporal net from Two-Stream ConvNets [2], for action representation. The convolutional layers and the proposed new layer, called frame-diff layer, are extracted and pooled with two temporal pooling strategies: Trajectory pooling and Line pooling. The pooled local descriptors are then encoded with vector of locally aggregated descriptors (VLAD) [3] to form the video representations. In order to verify the effectiveness of the proposed framework, we conduct experiments on UCF101 and HMDB51 data sets. It achieves accuracy of 92.08% on UCF101, which is the state-of-the-art, and the accuracy of 65.62% on HMDB51, which is comparable to the state-of-the-art. In addition, we propose the new Line pooling strategy, which can speed up the extraction of feature and achieve the comparable performance of the Trajectory pooling.

Academic Service

Guest Editor Journal Reviewer
  • IEEE Transactions on Image Processing (TIP)
  • IEEE Transactions on Knowledge and Data Engineering (TKDE)
  • IEEE Transactions on Neural Networks and Learning Systems (T-NNLS)
  • IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
  • IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI)
  • International Journal of Computer Vision (IJCV)
  • Transactions on Machine Learning Research (TMLR)
  • Pattern Recognition (PR)
  • ACM Transactions on Multimedia Computing, Communications and Applications (TOMM)
  • Knowledge-Based Systems (KNOSYS)
  • Neurocomputing (NEUCOM)
Conference Reviewer
  • CVPR 2021 (Outstanding Reviewer), 2022, 2023
  • ICLR 2022-2023
  • ICCV 2021, 2023
  • ICML 2020-2022
  • NeurIPS 2020-2023
  • AAAI 2021-2022
  • IJCAI 2021
  • AISTATS 2022


Google Cloud & YouTube-8M Video Understanding Challenge THUMOS Challenge 2015 (Action Recognition Task)
  • Yanbin Liu, Baixiang Fan, Shichao Zhao, Youjiang Xu and Yahong Han
  • Ranked 7th at the action recognition task.
The Second Big Data Technology Innovation Competition
  • Yanbin Liu, Chengyue Zhang, Guang Li, Shichao Zhao, Youjiang Xu and Yahong Han
  • Ranked 1st at the video retrieval track.


Friends & Collaborators
Linchao Zhu (UTS), Juho Lee (KAIST), Makoto Yamada (Kyoto University & RIKEN-AIP), Yan Yan (Washington State University), Xin Yu (UTS)

Copyright © Yanbin Liu  /  Last update 2021