CHU, WEI（褚崴）

Representative Journal Articles

Representative Conference Papers

Natural Language Processing

Computer Vision

Recommender Systems

Bioinformatics

Machine Learning

Representative Journal Articles

W. Chu and S. S. Keerthi (2007) Support vector ordinal regression, Neural Computation 19(3):792-815 (View Abstract)

In this paper, we propose two new support vector formulations for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution.
W. Chu, Z. Ghahramani, A. Podtelezhnikov and D. L. Wild (2006) Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction, IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(2):98-113 (View Abstract)

In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. By incorporating the information from long range interactions in beta-sheets, this model is also capable of carrying out inference on contact maps. [ps][supplement]
W. Chu and Z. Ghahramani (2005) Gaussian processes for ordinal regression, Journal of Machine Learning Research 6(Jul):1019-1041 (View Abstract)

In this paper, we present a probabilistic approach to ordinal regression in Gaussian processes. In the Bayesian framework of Gaussian processes, we propose a likelihood function for ordinal variables that is a generalization of the probit function. Two inference techniques, based on Laplace approximation and expectation propagation respectively, are applied for model selection. [pdf] [ps] [zip] [code]
W. Chu, S. S. Keerthi and C. J. Ong (2004) Bayesian support vector regression using a unified loss function, IEEE Transactions on Neural Networks 15(1):29-44 (View Abstract)

In this paper, we use soft insensitive loss function in likelihood evaluation, and describe a Bayesian framework in a stationary Gaussian process. Bayesian methods are used to implement model adaptation, while keeping the merits of support vector regression, such as quadratic programming and sparseness. Moreover, confidence interval is provided in prediction. [pdf] [ps] [zip] [code]
W. Chu, S. S. Keerthi and C. J. Ong (2003) Bayesian trigonometric support vector classifier, Neural Computation 15(9):2227-2254 (View Abstract)

In this paper, we propose Bayesian support vector classifier by introducing a novel likelihood function, known as trigonometric likelihood function. Model adaptation and ARD feature selection could be implemented intrinsically in hyperparameter inference. Another benefit is the class probability in making predictions. [pdf] [code]

Representative Conference Papers

W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. Tseng (2011) Unbiased online active learning in data streams, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-17) (View Abstract)

Unlabeled samples can be intelligently selected for labeling to minimize classification error. In many real-world applications, a large number of unlabeled samples arrive in a streaming manner, making it impossible to maintain all the data in a candidate pool. In this work, we consider the unbiasedness property in the sampling process, and design optimal instrumental distributions to minimize the variance in the stochastic process. Meanwhile, Bayesian linear classifiers with weighted maximum likelihood are optimized online to estimate parameters. [pdf]
W. Chu, V. Sindhwani, Z. Ghahramani and S. S. Keerthi (2006) Relational learning with Gaussian processes, Neural Information Processing Systems (NIPS-19):289-296 (View Abstract)

Correlation between instances is often modelled via a kernel function using input attributes of the instances. Relational knowledge can further reveal additional pairwise correlations between variables of interest. In this paper, we develop a class of models which incorporates both reciprocal relational information and input attributes using Gaussian process techniques. This approach provides a novel non-parametric Bayesian framework with a data-dependent prior for supervised learning tasks. We also apply this framework to semi-supervised learning. Experimental results on several real world data sets verify the usefulness of this algorithm. [pdf]
W. Chu and Z. Ghahramani (2005) Preference learning with Gaussian processes, International Conference on Machine Learning (ICML-22):137-144 (View Abstract)

In this paper, we propose a probabilistic kernel approach to preference learning based on Gaussian processes. A new likelihood function is proposed to capture the preference relations in the Bayesian framework. The generalized formulation is also applicable to tackle many multiclass problems. [pdf] [ps] [zip] [code]
W. Chu and S. S. Keerthi (2005) New approaches to support vector ordinal regression, International Conference on Machine Learning (ICML-22):145-152 (View Abstract)

In this paper, we propose two new support vector formulations for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. [pdf] [ps] [zip] [code]

Natural Language Processing Papers

L. Chao, J. He, T. Wang and W. Chu (2021) PairRE: Knowledge graph embeddings via paired relation vectors, ACL 2021: 4360-4369 (View Abstract)

Distance based knowledge graph embedding methods show promising results on link prediction task, on which two topics have been widely studied: one is the ability to handle complex relations, such as N-to-1, 1-to-N and N-to-N, the other is to encode various relation patterns, such as symmetry/antisymmetry. However, the existing methods fail to solve these two problems at the same time, which leads to unsatisfactory results. To mitigate this problem, we propose PairRE, a model with paired vectors for each relation representation. The paired vectors enable an adaptive adjustment of the margin in loss function to fit for complex relations. PairRE is capable of encoding three important relation patterns, symmetry/antisymmetry, inverse and composition. Given simple constraints on relation representations, PairRE can encode subrelation further.
K. Chen, W. Xu, X. Cheng, X. Zou, Y. Zhang, L. Song, T. Wang, Y. Qi and W. Chu (2020) Question directed graph attention network for numerical reasoning over text, EMNLP 2020:6759-6768 (View Abstract)

Numerical reasoning over texts, such as addition, subtraction, sorting and counting, is a challenging machine reading comprehension task, since it requires both natural language understanding and arithmetic computation. To address this challenge, we propose a heterogeneous graph representation for the context of the passage and question needed for such reasoning, and design a question directed graph attention network to drive multi-step numerical reasoning over this context graph. Our model, which combines deep learning and graph reasoning, achieves remarkable results in benchmark datasets such as DROP.
X. Chen, W. Xu, K. Chen, T. Wang, S. Jiang, F. Wang, W. Chu and Y. Qi (2020) SpellGCN: Incorporating phonological and visual similarities into language models for Chinese Spelling Check, ACL 2020:871–881 (View Abstract)

Chinese Spelling Check (CSC) is a task to detect and correct spelling errors in Chinese natural language. This paper proposes to incorporate phonological and visual similarity knowledge into language models for CSC via a specialized graph convolutional network (SpellGCN). The model builds a graph over the characters, and SpellGCN is learned to map this graph into a set of inter-dependent character classifiers. These classifiers are applied to the representations extracted by another network, such as BERT, enabling the whole network to be end-to-end trainable.
X. Lin, W. Jian, J. He, T. Wang, and W. Chu (2020) Generating informative conversational response using recurrent knowledge-interaction and knowledge-copy, ACL 2020:41–52 (View Abstract)

Knowledge-driven conversation approaches have achieved remarkable research attention recently. However, generating an informative response with multiple relevant knowledge without losing fluency and coherence is still one of the main challenges. To address this issue, this paper proposes a method that uses recurrent knowledge interaction among response decoding steps to incorporate appropriate knowledge. Furthermore, we introduce a knowledge copy mechanism using a knowledge-aware pointer network to copy words from external knowledge according to knowledge attention distribution. Our joint neural conversation model which integrates recurrent Knowledge-Interaction and knowledge Copy (KIC) performs well on generating informative responses.
M. Qiu, F.-L. Li, S. Wang, X. Gao, Y. Chen, W. Zhao, H. Chen, J. Huang and W. Chu(2017) AliMe Chat: A Sequence to Sequence and Rerank based Chatbot Engine, Annual Meeting of the Association for Computational Linguistics (ACL-55 Short Paper) (View Abstract)

Computer Vision Papers

W. Hong, J. Lao, W. Ren, J. Wang, J. Chen, W. Chu (2022) Training object detectors from scratch: An empirical study in the era of vision transformer, in Proc. of CVPR 2022 (View Abstract)

We aim to get rid of the “pre-train & fine-tune” paradigm of vision transformer and train transformer based object detector from scratch. One of the key findings is that both architectural changes and more epochs play critical roles in training vision transformer based detectors from scratch.
F. Xu, M. Wang, W. Zhang, Y. Cheng and W. Chu (2021) Discrimination-aware mechanism for fine-grained representation learning, CVPR 2021 (View Abstract)

Recently, with the emergence of retrieval requirements for certain individual in the same superclass, e.g., birds, persons, cars, fine-grained recognition task has attracted a significant amount of attention from academia and industry. In fine-grained recognition scenario, the inter-class differences are quite diverse and subtle, which makes it challenging to extract all the discriminative cues. Traditional training mechanism optimizes the overall discriminativeness of the whole feature. It may stop early when some feature elements has been trained to distinguish training samples well, leaving other elements insufficiently trained for a feature. This would result in a less generalizable feature extractor that only captures major discriminative cues and ignores subtle ones. Therefore, there is a need for a training mechanism that enforces the discriminativeness of all the elements in the feature to capture more the subtle visual cues. In this paper, we propose a Discrimination-Aware Mechanism (DAM) that iteratively identifies insufficiently trained elements and improves them. DAM is able to increase the number of well learned elements, which captures more visual cues by the feature extractor. In this way, a more informative representation is learned, which brings better generalization performance. We show that DAM can be easily applied to both proxy-based and pair-based loss functions, and thus can be used in most existing fine-grained recognition paradigms. Comprehensive experiments on CUB-200-2011, Cars196, Market-1501, and MSMT17 datasets demonstrate the advantages of our DAM based loss over the related state-of-the-art approaches.
W. Hong, P. Guo, W. Zhang, J. Chen and W. Chu (2021) LPSNet: A lightweight solution for fast panoptic segmentation, CVPR 2021 (View Abstract)

Panoptic segmentation is a challenging task aiming to simultaneously segment objects (things) at instance level and background contents (stuff) at semantic level. Existing methods mostly utilize two-stage detection network to attain instance segmentation results, and fully convolutional network to produce semantic segmentation prediction. Post-processing or additional modules are required to handle the conflicts between the outputs from these two nets, which makes such methods suffer from low efficiency, heavy memory consumption and complicated implementation. To simplify the pipeline and decrease computation/memory cost, we propose an one-stage approach called Lightweight Panoptic Segmentation Network (LPSNet), which does not involve proposal, anchor or mask head. Instead, we predict bounding box and semantic category at each pixel upon the feature map produced by an augmented feature pyramid, and design a parameter-free head to merge the per-pixel bounding box and semantic prediction into panoptic segmentation output. Our LPSNet is not only efficient in computation and memory, but also accurate in panoptic segmentation. Comprehensive experiments on COCO, Cityscapes and Mapillary Vistas datasets demonstrate the promising effectiveness and efficiency of the proposed LPSNet.
C. Jiang, K. Huang, S. He, X. Yang, W. Zhang, X. Zhang, Y. Cheng, L. Yang, Q. Wang, F. Xu, T. Pan and W. Chu (2021) Learning segment similarity and alignment in large-scale content based video retrieval, ACM MM 2021 (View Abstract)

Recommender Systems Papers

L. Li, W. Chu, J. Langford and X. Wang (2011) Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms, ACM International Conference on Web Search and Data Mining (WSDM-04) 297-306 (View Abstract)

In this paper, we introduce a replay methodology for contextual bandit algorithm evaluation. Different from simulator-based approaches, our method is completely data-driven and very easy to adapt to different applications. More importantly, our method can provide provably unbiased evaluations. Our empirical results on a large-scale news article recommendation dataset collected from Yahoo! Front Page conform well with our theoretical results. Furthermore, comparisons between our offline replay and online bucket evaluation of several contextual bandit algorithms show accuracy and effectiveness of our offline evaluation method. [pdf]
L. Li, W. Chu, J. Langford and R. E. Schapire (2010) A contextual-bandit approach to personalized news article recommendation, International World Wide Web Conference (WWW-19) 661-670 (View Abstract)

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation. In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. [pdf]
S.-T. Park and W. Chu (2009) Pairwise preference regression for cold-start recommendation, ACM Recommender Systems (RecSys-03):21-28 (View Abstract)

Recommender systems are widely used in online e-commerce applications to improve user engagement and then to increase revenue. A key challenge for recommender systems is providing high quality recommendation to users in ``cold-start" situations. We consider three types of cold-start problems: 1) recommendation on existing items for new users; 2) recommendation on new items for existing users; 3) recommendation on new items for new users. We propose predictive feature-based regression models that leverage all available information of users and items, such as user demographic information and item content features, to tackle cold-start problems. The resulting algorithms scale efficiently as a linear function of the number of observations. We verify the usefulness of our approach in three cold-start settings on the MovieLens and EachMovie datasets, by comparing with five alternatives including random, most popular, segmented most popular, and two variations of Vibes affinity algorithm widely used at Yahoo! for recommendation.
W. Chu and S.-T. Park (2009) Personalized recommendation on dynamic content using predictive bilinear models, International World Wide Web Conference (WWW-18):692-700 (View Abstract)

In Web-based services of dynamic content (such as news articles), recommender systems face the difficulty of timely identifying new items of high-quality and providing recommendations for new users. We propose a feature-based machine learning approach to personalized recommendation that is capable of handling the cold-start issue effectively. We maintain profiles of content of interest, in which temporal characteristics of the content, e.g. popularity and freshness, are updated in real-time manner. We also maintain profiles of users including demographic information and a summary of user activities within Yahoo! properties. Based on all features in user and content profiles, we develop predictive bilinear regression models to provide accurate personalized recommendations of new items for both existing and new users. This approach results in an offline model with light computational overhead compared with other recommender systems that require online re-training. The proposed framework is general and flexible for other personalized tasks. The superior performance of our approach is verified on a large-scale data set collected from the Today-Module on Yahoo! Front Page, with comparison against six competitive approaches. [pdf] [slides]

Bioinformatics Papers

W. Chu, Z. Ghahramani, R. Krause and D. L. Wild (2006) Identifying protein complexes in high-throughput protein interaction screens using an infinite latent feature model, Pacific Symposium on Biocomputing (PSB-11):231-242 (View Abstract)

We propose a Bayesian approach to identify protein complexes and their constituents from high-throughput protein-protein interaction screens. An infinite latent feature model that allows for multi-complex membership by individual proteins is coupled with a graph diffusion kernel that evaluates the likelihood of two proteins belonging to the same complex. Gibbs sampling is then used to infer a catalog of protein complexes from the interaction screen data. An advantage of this model is that it places no prior constraints on the number of complexes and automatically infers the number of significant complexes from the data. Validation results using affinity purification/mass spectrometry experimental data from yeast RNA-processing complexes indicate that our method is capable of partitioning the data in a biologically meaningful way.
W. Chu, Z. Ghahramani, F. Falciani and D. L. Wild (2005) Biomarker discovery with Gaussian processes in microarray gene expression data, Bioinformatics 2005(21):3385-3393 (View Abstract)

In this paper, we describe a gene selection algorithm based on Gaussian processes to discover consistent gene expression patterns associated with ordinal clinical phenotypes. The technique of automatic relevance determination is applied to represent the significance level of the genes in a Bayesian framework. [pdf] [ps] [code]
W. Chu, Z. Ghahramani and D. L. Wild (2004) A graphical model for protein secondary structure prediction, International Conference on Machine Learning (ICML-21):161-168 (View Abstract)

In this paper, we present a graphical model that extends segmental semi-Markov models (SSMM) to exploit multiple sequence alignment profiles for protein structure prediction. A novel parameterized model is proposed as the likelihood function for the SSMM. By incorporating the information from long range interactions in beta-sheets, this model is capable of carrying out inference on contact maps. [pdf] [ps] [zip] [webserver]

Machine Learning Papers

W. Chu, L. Li, L. Reyzin, and R. E. Schapire (2011) Contextual bandits with linear payoff functions, International Conference on Artificial Intelligence and Statistics (AISTATS-14) (View Abstract)

In this paper we study the contextual ban- dit problem (also known as the multi-armed bandit problem with expert advice) for linear payo. functions. we prove a high-probability regret upper bound. We also prove a lower bound for this setting, matching the upper bound up to logarithmic factors. [pdf]
W. Chu and Z. Ghahramani (2009) Probabilistic models for incomplete multi-dimensional arrays, International Conference on Artificial Intelligence and Statistics (AISTATS-12):89-96 (View Abstract)

In multiway data, each sample is measured by multiple sets of correlated attributes. We develop a probabilistic framework for modeling structural dependency from partially observed multi-dimensional array data, known as pTucker. Latent components associated with individual array dimensions are jointly retrieved while the core tensor is integrated out. The resulting algorithm is capable of handling large-scale data sets. We verify the usefulness of this approach by comparing against classical models on applications to modeling amino acid fluorescence, collaborative filtering and a number of benchmark multiway array data. [pdf] [third-party pTucker code]
W. Chu, S. S. Keerthi and C. J. Ong (2001) A unified loss function in Bayesian framework for support vector regression, International Conference on Machine Learning (ICML-18):51-58

CHU, WEI（褚崴）

W. Chu and S. S. Keerthi (2007) Support vector ordinal regression, Neural Computation 19(3):792-815 (View Abstract)

W. Chu, Z. Ghahramani, A. Podtelezhnikov and D. L. Wild (2006) Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction, IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(2):98-113 (View Abstract)

W. Chu and Z. Ghahramani (2005) Gaussian processes for ordinal regression, Journal of Machine Learning Research 6(Jul):1019-1041 (View Abstract)

W. Chu, S. S. Keerthi and C. J. Ong (2004) Bayesian support vector regression using a unified loss function, IEEE Transactions on Neural Networks 15(1):29-44 (View Abstract)

W. Chu, S. S. Keerthi and C. J. Ong (2003) Bayesian trigonometric support vector classifier, Neural Computation 15(9):2227-2254 (View Abstract)

W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. Tseng (2011) Unbiased online active learning in data streams, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-17) (View Abstract)

W. Chu, V. Sindhwani, Z. Ghahramani and S. S. Keerthi (2006) Relational learning with Gaussian processes, Neural Information Processing Systems (NIPS-19):289-296 (View Abstract)

W. Chu and Z. Ghahramani (2005) Preference learning with Gaussian processes, International Conference on Machine Learning (ICML-22):137-144 (View Abstract)

W. Chu and S. S. Keerthi (2005) New approaches to support vector ordinal regression, International Conference on Machine Learning (ICML-22):145-152 (View Abstract)

L. Chao, J. He, T. Wang and W. Chu (2021) PairRE: Knowledge graph embeddings via paired relation vectors, ACL 2021: 4360-4369 (View Abstract)

K. Chen, W. Xu, X. Cheng, X. Zou, Y. Zhang, L. Song, T. Wang, Y. Qi and W. Chu (2020) Question directed graph attention network for numerical reasoning over text, EMNLP 2020:6759-6768 (View Abstract)

X. Chen, W. Xu, K. Chen, T. Wang, S. Jiang, F. Wang, W. Chu and Y. Qi (2020) SpellGCN: Incorporating phonological and visual similarities into language models for Chinese Spelling Check, ACL 2020:871–881 (View Abstract)

X. Lin, W. Jian, J. He, T. Wang, and W. Chu (2020) Generating informative conversational response using recurrent knowledge-interaction and knowledge-copy, ACL 2020:41–52 (View Abstract)

M. Qiu, F.-L. Li, S. Wang, X. Gao, Y. Chen, W. Zhao, H. Chen, J. Huang and W. Chu(2017) AliMe Chat: A Sequence to Sequence and Rerank based Chatbot Engine, Annual Meeting of the Association for Computational Linguistics (ACL-55 Short Paper) (View Abstract)

W. Hong, J. Lao, W. Ren, J. Wang, J. Chen, W. Chu (2022) Training object detectors from scratch: An empirical study in the era of vision transformer, in Proc. of CVPR 2022 (View Abstract)

F. Xu, M. Wang, W. Zhang, Y. Cheng and W. Chu (2021) Discrimination-aware mechanism for fine-grained representation learning, CVPR 2021 (View Abstract)

W. Hong, P. Guo, W. Zhang, J. Chen and W. Chu (2021) LPSNet: A lightweight solution for fast panoptic segmentation, CVPR 2021 (View Abstract)

C. Jiang, K. Huang, S. He, X. Yang, W. Zhang, X. Zhang, Y. Cheng, L. Yang, Q. Wang, F. Xu, T. Pan and W. Chu (2021) Learning segment similarity and alignment in large-scale content based video retrieval, ACM MM 2021 (View Abstract)

L. Li, W. Chu, J. Langford and X. Wang (2011) Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms, ACM International Conference on Web Search and Data Mining (WSDM-04) 297-306 (View Abstract)

L. Li, W. Chu, J. Langford and R. E. Schapire (2010) A contextual-bandit approach to personalized news article recommendation, International World Wide Web Conference (WWW-19) 661-670 (View Abstract)

S.-T. Park and W. Chu (2009) Pairwise preference regression for cold-start recommendation, ACM Recommender Systems (RecSys-03):21-28 (View Abstract)

W. Chu and S.-T. Park (2009) Personalized recommendation on dynamic content using predictive bilinear models, International World Wide Web Conference (WWW-18):692-700 (View Abstract)

W. Chu, Z. Ghahramani, R. Krause and D. L. Wild (2006) Identifying protein complexes in high-throughput protein interaction screens using an infinite latent feature model, Pacific Symposium on Biocomputing (PSB-11):231-242 (View Abstract)

W. Chu, Z. Ghahramani, F. Falciani and D. L. Wild (2005) Biomarker discovery with Gaussian processes in microarray gene expression data, Bioinformatics 2005(21):3385-3393 (View Abstract)

W. Chu, Z. Ghahramani and D. L. Wild (2004) A graphical model for protein secondary structure prediction, International Conference on Machine Learning (ICML-21):161-168 (View Abstract)

W. Chu, L. Li, L. Reyzin, and R. E. Schapire (2011) Contextual bandits with linear payoff functions, International Conference on Artificial Intelligence and Statistics (AISTATS-14) (View Abstract)

W. Chu and Z. Ghahramani (2009) Probabilistic models for incomplete multi-dimensional arrays, International Conference on Artificial Intelligence and Statistics (AISTATS-12):89-96 (View Abstract)

W. Chu, S. S. Keerthi and C. J. Ong (2001) A unified loss function in Bayesian framework for support vector regression, International Conference on Machine Learning (ICML-18):51-58

EMAIL : email dot chuwei at gmail.com

2022.03.09