Summary for 2021-10-09, created on 2021-12-15

Braxlines: Fast and Interactive Toolkit for RL-driven Behavior Engineering beyond Reward Maximization arxiv:2110.04686 📈 67

Shixiang Shane Gu, Manfred Diaz, Daniel C. Freeman, Hiroki Furuta, Seyed Kamyar Seyed Ghasemipour, Anton Raichuk, Byron David, Erik Frey, Erwin Coumans, Olivier Bachem

**Abstract:** The goal of continuous control is to synthesize desired behaviors. In reinforcement learning (RL)-driven approaches, this is often accomplished through careful task reward engineering for efficient exploration and running an off-the-shelf RL algorithm. While reward maximization is at the core of RL, reward engineering is not the only -- sometimes nor the easiest -- way for specifying complex behaviors. In this paper, we introduce \braxlines, a toolkit for fast and interactive RL-driven behavior generation beyond simple reward maximization that includes Composer, a programmatic API for generating continuous control environments, and set of stable and well-tested baselines for two families of algorithms -- mutual information maximization (MiMax) and divergence minimization (DMin) -- supporting unsupervised skill learning and distribution sketching as other modes of behavior specification. In addition, we discuss how to standardize metrics for evaluating these algorithms, which can no longer rely on simple reward maximization. Our implementations build on a hardware-accelerated Brax simulator in Jax with minimal modifications, enabling behavior synthesis within minutes of training. We hope Braxlines can serve as an interactive toolkit for rapid creation and testing of environments and behaviors, empowering explosions of future benchmark designs and new modes of RL-driven behavior generation and their algorithmic research.

Vector-quantized Image Modeling with Improved VQGAN arxiv:2110.04627 📈 45

Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

**Abstract:** Pretraining language models with next-token prediction on massive text corpora has delivered phenomenal zero-shot, few-shot, transfer learning and multi-tasking capabilities on both generative and discriminative language tasks. Motivated by this success, we explore a Vector-quantized Image Modeling (VIM) approach that involves pretraining a Transformer to predict rasterized image tokens autoregressively. The discrete image tokens are encoded from a learned Vision-Transformer-based VQGAN (ViT-VQGAN). We first propose multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional, class-conditioned image generation and unsupervised representation learning. When trained on ImageNet at 256x256 resolution, we achieve Inception Score (IS) of 175.1 and Fr'echet Inception Distance (FID) of 4.17, a dramatic improvement over the vanilla VQGAN, which obtains 70.6 and 17.04 for IS and FID, respectively. Based on ViT-VQGAN and unsupervised pretraining, we further evaluate the pretrained Transformer by averaging intermediate features, similar to Image GPT (iGPT). This ImageNet-pretrained VIM-L significantly beats iGPT-L on linear-probe accuracy from 60.3% to 72.2% for a similar model size. ViM-L also outperforms iGPT-XL which is trained with extra web image data and larger model size.

Evaluating Predictive Distributions: Does Bayesian Deep Learning Work? arxiv:2110.04629 📈 8

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy

**Abstract:** Posterior predictive distributions quantify uncertainties ignored by point estimates. This paper introduces \textit{The Neural Testbed}, which provides tools for the systematic evaluation of agents that generate such predictions. Crucially, these tools assess not only the quality of marginal predictions per input, but also joint predictions given many inputs. Joint distributions are often critical for useful uncertainty quantification, but they have been largely overlooked by the Bayesian deep learning community. We benchmark several approaches to uncertainty estimation using a neural-network-based data generating process. Our results reveal the importance of evaluation beyond marginal predictions. Further, they reconcile sources of confusion in the field, such as why Bayesian deep learning approaches that generate accurate marginal predictions perform poorly in sequential decision tasks, how incorporating priors can be helpful, and what roles epistemic versus aleatoric uncertainty play when evaluating performance. We also present experiments on real-world challenge datasets, which show a high correlation with testbed results, and that the importance of evaluating joint predictive distributions carries over to real data. As part of this effort, we opensource The Neural Testbed, including all implementations from this paper.

Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design arxiv:2110.04624 📈 7

Wengong Jin, Jeremy Wohlwend, Regina Barzilay, Tommi Jaakkola

**Abstract:** Antibodies are versatile proteins that bind to pathogens like viruses and stimulate the adaptive immune system. The specificity of antibody binding is determined by complementarity-determining regions (CDRs) at the tips of these Y-shaped proteins. In this paper, we propose a generative model to automatically design the CDRs of antibodies with enhanced binding specificity or neutralization capabilities. Previous generative approaches formulate protein design as a structure-conditioned sequence generation task, assuming the desired 3D structure is given a priori. In contrast, we propose to co-design the sequence and 3D structure of CDRs as graphs. Our model unravels a sequence autoregressively while iteratively refining its predicted global structure. The inferred structure in turn guides subsequent residue choices. For efficiency, we model the conditional dependence between residues inside and outside of a CDR in a coarse-grained manner. Our method achieves superior log-likelihood on the test set and outperforms previous baselines in designing antibodies capable of neutralizing the SARS-CoV-2 virus.

Focus Your Distribution: Coarse-to-Fine Non-Contrastive Learning for Anomaly Detection and Localization arxiv:2110.04538 📈 7

Ye Zheng, Xiang Wang, Rui Deng, Tianpeng Bao, Rui Zhao, Liwei Wu

**Abstract:** The essence of unsupervised anomaly detection is to learn the compact distribution of normal samples and detect outliers as anomalies in testing. Meanwhile, the anomalies in real-world are usually subtle and fine-grained in a high-resolution image especially for industrial applications. Towards this end, we propose a novel framework for unsupervised anomaly detection and localization. Our method aims at learning dense and compact distribution from normal images with a coarse-to-fine alignment process. The coarse alignment stage standardizes the pixel-wise position of objects in both image and feature levels. The fine alignment stage then densely maximizes the similarity of features among all corresponding locations in a batch. To facilitate the learning with only normal images, we propose a new pretext task called non-contrastive learning for the fine alignment stage. Non-contrastive learning extracts robust and discriminating normal image representations without making assumptions on abnormal samples, and it thus empowers our model to generalize to various anomalous scenarios. Extensive experiments on two typical industrial datasets of MVTec AD and BenTech AD demonstrate that our framework is effective in detecting various real-world defects and achieves a new state-of-the-art in industrial unsupervised anomaly detection.

Vision Transformer based COVID-19 Detection using Chest X-rays arxiv:2110.04458 📈 7

Koushik Sivarama Krishnan, Karthik Sivarama Krishnan

**Abstract:** COVID-19 is a global pandemic, and detecting them is a momentous task for medical professionals today due to its rapid mutations. Current methods of examining chest X-rays and CT scan requires profound knowledge and are time consuming, which suggests that it shrinks the precious time of medical practitioners when people's lives are at stake. This study tries to assist this process by achieving state-of-the-art performance in classifying chest X-rays by fine-tuning Vision Transformer(ViT). The proposed approach uses pretrained models, fine-tuned for detecting the presence of COVID-19 disease on chest X-rays. This approach achieves an accuracy score of 97.61%, precision score of 95.34%, recall score of 93.84% and, f1-score of 94.58%. This result signifies the performance of transformer-based models on chest X-ray.

Learning Visual Shape Control of Novel 3D Deformable Objects from Partial-View Point Clouds arxiv:2110.04685 📈 5

Bao Thach, Brian Y. Cho, Alan Kuntz, Tucker Hermans

**Abstract:** If robots could reliably manipulate the shape of 3D deformable objects, they could find applications in fields ranging from home care to warehouse fulfillment to surgical assistance. Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape. Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models. We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the object being manipulated and a point cloud of the goal shape to learn a low-dimensional representation of the object shape. This shape embedding enables the robot to learn to define a visual servo controller that provides Cartesian pose changes to the robot end-effector causing the object to deform towards its target shape. Crucially, we demonstrate both in simulation and on a physical robot that DeformerNet reliably generalizes to object shapes and material stiffness not seen during training and outperforms comparison methods for both the generic shape control and the surgical task of retraction.

Representation Learning for Online and Offline RL in Low-rank MDPs arxiv:2110.04652 📈 5

Masatoshi Uehara, Xuezhou Zhang, Wen Sun

**Abstract:** This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner. We focus on the low-rank Markov Decision Processes (MDPs) where the transition dynamics correspond to a low-rank transition matrix. Unlike prior works that assume the representation is known (e.g., linear MDPs), here we need to learn the representation for the low-rank MDP. We study both the online RL and offline RL settings. For the online setting, operating with the same computational oracles used in FLAMBE (Agarwal et.al), the state-of-art algorithm for learning representations in low-rank MDPs, we propose an algorithm REP-UCB Upper Confidence Bound driven Representation learning for RL), which significantly improves the sample complexity from $\widetilde{O}( A^9 d^7 / (ε^{10} (1-γ)^{22}))$ for FLAMBE to $\widetilde{O}( A^4 d^4 / (ε^2 (1-γ)^{3}) )$ with $d$ being the rank of the transition matrix (or dimension of the ground truth representation), $A$ being the number of actions, and $γ$ being the discounted factor. Notably, REP-UCB is simpler than FLAMBE, as it directly balances the interplay between representation learning, exploration, and exploitation, while FLAMBE is an explore-then-commit style approach and has to perform reward-free exploration step-by-step forward in time. For the offline RL setting, we develop an algorithm that leverages pessimism to learn under a partial coverage condition: our algorithm is able to compete against any policy as long as it is covered by the offline distribution.

Does Preprocessing Help Training Over-parameterized Neural Networks? arxiv:2110.04622 📈 5

Zhao Song, Shuo Yang, Ruizhe Zhang

**Abstract:** Deep neural networks have achieved impressive performance in many areas. Designing a fast and provable method for training neural networks is a fundamental question in machine learning. The classical training method requires paying $Ω(mnd)$ cost for both forward computation and backward computation, where $m$ is the width of the neural network, and we are given $n$ training points in $d$-dimensional space. In this paper, we propose two novel preprocessing ideas to bypass this $Ω(mnd)$ barrier: $\bullet$ First, by preprocessing the initial weights of the neural networks, we can train the neural network in $\widetilde{O}(m^{1-Θ(1/d)} n d)$ cost per iteration. $\bullet$ Second, by preprocessing the input data points, we can train the neural network in $\widetilde{O} (m^{4/5} nd )$ cost per iteration. From the technical perspective, our result is a sophisticated combination of tools in different fields, greedy-type convergence analysis in optimization, sparsity observation in practical work, high-dimensional geometric search in data structure, concentration and anti-concentration in probability. Our results also provide theoretical insights for a large number of previously established fast training methods. In addition, our classical algorithm can be generalized to the Quantum computation model. Interestingly, we can get a similar sublinear cost per iteration but avoid preprocessing initial weights or input data points.

Discriminative Multimodal Learning via Conditional Priors in Generative Models arxiv:2110.04616 📈 5

Rogelio A. Mancisidor, Michael Kampffmeyer, Kjersti Aas, Robert Jenssen

**Abstract:** Deep generative models with latent variables have been used lately to learn joint representations and generative processes from multi-modal data. These two learning mechanisms can, however, conflict with each other and representations can fail to embed information on the data modalities. This research studies the realistic scenario in which all modalities and class labels are available for model training, but where some modalities and labels required for downstream tasks are missing. We show, in this scenario, that the variational lower bound limits mutual information between joint representations and missing modalities. We, to counteract these problems, introduce a novel conditional multi-modal discriminative model that uses an informative prior distribution and optimizes a likelihood-free objective function that maximizes mutual information between joint representations and missing modalities. Extensive experimentation shows the benefits of the model we propose, the empirical results showing that our model achieves state-of-the-art results in representative problems such as downstream classification, acoustic inversion and annotation generation.

Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach arxiv:2110.04514 📈 5

Qitian Wu, Chenxiao Yang, Junchi Yan

**Abstract:** We target open-world feature extrapolation problem where the feature space of input data goes through expansion and a model trained on partially observed features needs to handle new features in test data without further retraining. The problem is of much significance for dealing with features incrementally collected from different fields. To this end, we propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data. Based on our framework, we design two training strategies, a self-supervised approach and an inductive learning approach, to endow the model with extrapolation ability and alleviate feature-level over-fitting. We also provide theoretical analysis on the generalization error on test data with new features, which dissects the impact of training features and algorithms on generalization performance. Our experiments over several classification datasets and large-scale advertisement click prediction datasets demonstrate that our model can produce effective embeddings for unseen features and significantly outperforms baseline methods that adopt KNN and local aggregation.

Demystifying the Transferability of Adversarial Attacks in Computer Networks arxiv:2110.04488 📈 5

Ehsan Nowroozi, Mauro Conti, Yassine Mekdad, Mohammad Hajian Berenjestanaki, Abdeslam EL Fergougui

**Abstract:** Deep Convolutional Neural Networks (CNN) models are one of the most popular networks in deep learning. With their large fields of application in different areas, they are extensively used in both academia and industry. CNN-based models include several exciting implementations such as early breast cancer detection or detecting developmental delays in children (e.g., autism, speech disorders, etc.). However, previous studies demonstrate that these models are subject to various adversarial attacks. Interestingly, some adversarial examples could potentially still be effective against different unknown models. This particular property is known as adversarial transferability, and prior works slightly analyzed this characteristic in a very limited application domain. In this paper, we aim to demystify the transferability threats in computer networks by studying the possibility of transferring adversarial examples. In particular, we provide the first comprehensive study which assesses the robustness of CNN-based models for computer networks against adversarial transferability. In our experiments, we consider five different attacks: (1) the Iterative Fast Gradient Method (I-FGSM), (2) the Jacobian-based Saliency Map attack (JSMA), (3) the L-BFGS attack, (4) the Projected Gradient Descent attack (PGD), and (5) the DeepFool attack. These attacks are performed against two well-known datasets: the N-BaIoT dataset and the Domain Generating Algorithms (DGA) dataset. Our results show that the transferability happens in specific use cases where the adversary can easily compromise the victim's network with very few knowledge of the targeted model.

PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control arxiv:2110.04486 📈 5

Yunchao He, Jian Luan, Yujun Wang

**Abstract:** Sequence expansion between encoder and decoder is a critical challenge in sequence-to-sequence TTS. Attention-based methods achieve great naturalness but suffer from unstable issues like missing and repeating phonemes, not to mention accurate duration control. Duration-informed methods, on the contrary, seem to easily adjust phoneme duration but show obvious degradation in speech naturalness. This paper proposes PAMA-TTS to address the problem. It takes the advantage of both flexible attention and explicit duration models. Based on the monotonic attention mechanism, PAMA-TTS also leverages token duration and relative position of a frame, especially countdown information, i.e. in how many future frames the present phoneme will end. They help the attention to move forward along the token sequence in a soft but reliable control. Experimental results prove that PAMA-TTS achieves the highest naturalness, while has on-par or even better duration controllability than the duration-informed model.

An Augmented Reality Platform for Introducing Reinforcement Learning to K-12 Students with Robots arxiv:2110.04697 📈 4

Ziyi Zhang, Samuel Micah Akai-Nettey, Adonai Addo, Chris Rogers, Jivko Sinapov

**Abstract:** Interactive reinforcement learning, where humans actively assist during an agent's learning process, has the promise to alleviate the sample complexity challenges of practical algorithms. However, the inner workings and state of the robot are typically hidden from the teacher when humans provide feedback. To create a common ground between the human and the learning robot, in this paper, we propose an Augmented Reality (AR) system that reveals the hidden state of the learning to the human users. This paper describes our system's design and implementation and concludes with a discussion on two directions for future work which we are pursuing: 1) use of our system in AI education activities at the K-12 level; and 2) development of a framework for an AR-based human-in-the-loop reinforcement learning, where the human teacher can see sensory and cognitive representations of the robot overlaid in the real world.

Complex Network-Based Approach for Feature Extraction and Classification of Musical Genres arxiv:2110.04654 📈 4

Matheus Henrique Pimenta-Zanon, Glaucia Maria Bressan, Fabrício Martins Lopes

**Abstract:** Musical genre's classification has been a relevant research topic. The association between music and genres is fundamental for the media industry, which manages musical recommendation systems, and for music streaming services, which may appear classified by genres. In this context, this work presents a feature extraction method for the automatic classification of musical genres, based on complex networks and their topological measurements. The proposed method initially converts the musics into sequences of musical notes and then maps the sequences as complex networks. Topological measurements are extracted to characterize the network topology, which composes a feature vector that applies to the classification of musical genres. The method was evaluated in the classification of 10 musical genres by adopting the GTZAN dataset and 8 musical genres by adopting the FMA dataset. The results were compared with methods in the literature. The proposed method outperformed all compared methods by presenting high accuracy and low standard deviation, showing its suitability for the musical genre's classification, which contributes to the media industry in the automatic classification with assertiveness and robustness. The proposed method is implemented in an open source in the Python language and freely available at https://github.com/omatheuspimenta/examinner.

On the Relation between Syntactic Divergence and Zero-Shot Performance arxiv:2110.04644 📈 4

Ofir Arviv, Dmitry Nikolaev, Taelin Karidi, Omri Abend

**Abstract:** We explore the link between the extent to which syntactic relations are preserved in translation and the ease of correctly constructing a parse tree in a zero-shot setting. While previous work suggests such a relation, it tends to focus on the macro level and not on the level of individual edges-a gap we aim to address. As a test case, we take the transfer of Universal Dependencies (UD) parsing from English to a diverse set of languages and conduct two sets of experiments. In one, we analyze zero-shot performance based on the extent to which English source edges are preserved in translation. In another, we apply three linguistically motivated transformations to UD, creating more cross-lingually stable versions of it, and assess their zero-shot parsability. In order to compare parsing performance across different schemes, we perform extrinsic evaluation on the downstream task of cross-lingual relation extraction (RE) using a subset of a popular English RE benchmark translated to Russian and Korean. In both sets of experiments, our results suggest a strong relation between cross-lingual stability and zero-shot parsing performance.

DenseNet approach to segmentation and classification of dermatoscopic skin lesions images arxiv:2110.04632 📈 4

Reza Zare, Arash Pourkazemi

**Abstract:** At present, cancer is one of the most important health issues in the world. Because early detection and appropriate treatment in cancer are very effective in the recovery and survival of patients, image processing as a diagnostic tool can help doctors to diagnose in the first recognition of cancer. One of the most important steps in diagnosing a skin lesion is to automatically detect the border of the skin image because the accuracy of the next steps depends on it. If these subtleties are identified, they can have a great impact on the diagnosis of the disease. Therefore, there is a good opportunity to develop more accurate algorithms to analyze such images. This paper proposes an improved method for segmentation and classification for skin lesions using two architectures, the U-Net for image segmentation and the DenseNet121 for image classification which have excellent accuracy. We tested the segmentation architecture of our model on the ISIC-2018 dataset and the classification on the HAM10000 dataset. Our results show that the combination of U-Net and DenseNet121 architectures provides acceptable results in dermatoscopic image analysis compared to previous research. Another classification examined in this study is cancerous and non-cancerous samples. In this classification, cancerous and non-cancerous samples were detected in DenseNet121 network with 79.49% and 93.11% accuracy respectively.

Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets arxiv:2110.04612 📈 4

Jimmy Tobin, Katrin Tomanek

**Abstract:** This study investigates the performance of personalized automatic speech recognition (ASR) for recognizing disordered speech using small amounts of per-speaker adaptation data. We trained personalized models for 195 individuals with different types and severities of speech impairment with training sets ranging in size from <1 minute to 18-20 minutes of speech data. Word error rate (WER) thresholds were selected to determine Success Percentage (the percentage of personalized models reaching the target WER) in different application scenarios. For the home automation scenario, 79% of speakers reached the target WER with 18-20 minutes of speech; but even with only 3-4 minutes of speech, 63% of speakers reached the target WER. Further evaluation found similar improvement on test sets with conversational and out-of-domain, unprompted phrases. Our results demonstrate that with only a few minutes of recordings, individuals with disordered speech could benefit from personalized ASR.

Learning Single/Multi-Attribute of Object with Symmetry and Group arxiv:2110.04603 📈 4

Yong-Lu Li, Yue Xu, Xinyu Xu, Xiaohan Mao, Cewu Lu

**Abstract:** Attributes and objects can compose diverse compositions. To model the compositional nature of these concepts, it is a good choice to learn them as transformations, e.g., coupling and decoupling. However, complex transformations need to satisfy specific principles to guarantee rationality. Here, we first propose a previously ignored principle of attribute-object transformation: Symmetry. For example, coupling peeled-apple with attribute peeled should result in peeled-apple, and decoupling peeled from apple should still output apple. Incorporating the symmetry, we propose a transformation framework inspired by group theory, i.e., SymNet. It consists of two modules: Coupling Network and Decoupling Network. We adopt deep neural networks to implement SymNet and train it in an end-to-end paradigm with the group axioms and symmetry as objectives. Then, we propose a Relative Moving Distance (RMD) based method to utilize the attribute change instead of the attribute pattern itself to classify attributes. Besides the compositions of single-attribute and object, our RMD is also suitable for complex compositions of multiple attributes and objects when incorporating attribute correlations. SymNet can be utilized for attribute learning, compositional zero-shot learning and outperforms the state-of-the-art on four widely-used benchmarks. Code is at https://github.com/DirtyHarryLYL/SymNet.

Automatic Recognition of Abdominal Organs in Ultrasound Images based on Deep Neural Networks and K-Nearest-Neighbor Classification arxiv:2110.04563 📈 4

Keyu Li, Yangxin Xu, Max Q. -H. Meng

**Abstract:** Abdominal ultrasound imaging has been widely used to assist in the diagnosis and treatment of various abdominal organs. In order to shorten the examination time and reduce the cognitive burden on the sonographers, we present a classification method that combines the deep learning techniques and k-Nearest-Neighbor (k-NN) classification to automatically recognize various abdominal organs in the ultrasound images in real time. Fine-tuned deep neural networks are used in combination with PCA dimension reduction to extract high-level features from raw ultrasound images, and a k-NN classifier is employed to predict the abdominal organ in the image. We demonstrate the effectiveness of our method in the task of ultrasound image classification to automatically recognize six abdominal organs. A comprehensive comparison of different configurations is conducted to study the influence of different feature extractors and classifiers on the classification accuracy. Both quantitative and qualitative results show that with minimal training effort, our method can "lazily" recognize the abdominal organs in the ultrasound images in real time with an accuracy of 96.67%. Our implementation code is publicly available at: https://github.com/LeeKeyu/abdominal_ultrasound_classification.

Graph Neural Networks in Real-Time Fraud Detection with Lambda Architecture arxiv:2110.04559 📈 4

Mingxuan Lu, Zhichao Han, Zitao Zhang, Yang Zhao, Yinan Shan

**Abstract:** Transaction checkout fraud detection is an essential risk control components for E-commerce marketplaces. In order to leverage graph networks to decrease fraud rate efficiently and guarantee the information flow passed through neighbors only from the past of the checkouts, we first present a novel Directed Dynamic Snapshot (DDS) linkage design for graph construction and a Lambda Neural Networks (LNN) architecture for effective inference with Graph Neural Networks embeddings. Experiments show that our LNN on DDS graph, outperforms baseline models significantly and is computational efficient for real-time fraud detection.

Generating Disentangled Arguments with Prompts: A Simple Event Extraction Framework that Works arxiv:2110.04525 📈 4

Jinghui Si, Xutan Peng, Chen Li, Haotian Xu, Jianxin Li

**Abstract:** Event Extraction bridges the gap between text and event signals. Based on the assumption of trigger-argument dependency, existing approaches have achieved state-of-the-art performance with expert-designed templates or complicated decoding constraints. In this paper, for the first time we introduce the prompt-based learning strategy to the domain of Event Extraction, which empowers the automatic exploitation of label semantics on both input and output sides. To validate the effectiveness of the proposed generative method, we conduct extensive experiments with 11 diverse baselines. Empirical results show that, in terms of F1 score on Argument Extraction, our simple architecture is stronger than any other generative counterpart and even competitive with algorithms that require template engineering. Regarding the measure of recall, it sets new overall records for both Argument and Trigger Extractions. We hereby recommend this framework to the community, with the code publicly available at https://git.io/GDAP.

Pairwise Margin Maximization for Deep Neural Networks arxiv:2110.04519 📈 4

Berry Weinstein, Shai Fine, Yacov Hel-Or

**Abstract:** The weight decay regularization term is widely used during training to constrain expressivity, avoid overfitting, and improve generalization. Historically, this concept was borrowed from the SVM maximum margin principle and extended to multi-class deep networks. Carefully inspecting this principle reveals that it is not optimal for multi-class classification in general, and in particular when using deep neural networks. In this paper, we explain why this commonly used principle is not optimal and propose a new regularization scheme, called {\em Pairwise Margin Maximization} (PMM), which measures the minimal amount of displacement an instance should take until its predicted classification is switched. In deep neural networks, PMM can be implemented in the vector space before the network's output layer, i.e., in the deep feature space, where we add an additional normalization term to avoid convergence to a trivial solution. We demonstrate empirically a substantial improvement when training a deep neural network with PMM compared to the standard regularization terms.

Paperswithtopic: Topic Identification from Paper Title Only arxiv:2110.15721 📈 3

Daehyun Cho, Christian Wallraven

**Abstract:** The deep learning field is growing rapidly as witnessed by the exponential growth of papers submitted to journals, conferences, and pre-print servers. To cope with the sheer number of papers, several text mining tools from natural language processing (NLP) have been proposed that enable researchers to keep track of recent findings. In this context, our paper makes two main contributions: first, we collected and annotated a dataset of papers paired by title and sub-field from the field of artificial intelligence (AI), and, second, we present results on how to predict a paper's AI sub-field from a given paper title only. Importantly, for the latter, short-text classification task we compare several algorithms from conventional machine learning all the way up to recent, larger transformer architectures. Finally, for the transformer models, we also present gradient-based, attention visualizations to further explain the model's classification process. All code can be found at \url{https://github.com/1pha/paperswithtopic}

An In-depth Summary of Recent Artificial Intelligence Applications in Drug Design arxiv:2110.05478 📈 3

Yi Zhang

**Abstract:** As a promising tool to navigate in the vast chemical space, artificial intelligence (AI) is leveraged for drug design. From the year 2017 to 2021, the number of applications of several recent AI models (i.e. graph neural network (GNN), recurrent neural network (RNN), variation autoencoder (VAE), generative adversarial network (GAN), flow and reinforcement learning (RL)) in drug design increases significantly. Many relevant literature reviews exist. However, none of them provides an in-depth summary of many applications of the recent AI models in drug design. To complement the existing literature, this survey includes the theoretical development of the previously mentioned AI models and detailed summaries of 42 recent applications of AI in drug design. Concretely, 13 of them leverage GNN for molecular property prediction and 29 of them use RL and/or deep generative models for molecule generation and optimization. In most cases, the focus of the summary is the models, their variants, and modifications for specific tasks in drug design. Moreover, 60 additional applications of AI in molecule generation and optimization are briefly summarized in a table. Finally, this survey provides a holistic discussion of the abundant applications so that the tasks, potential solutions, and challenges in AI-based drug design become evident.

K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters arxiv:2110.04660 📈 3

Seyed Omid Mohammadi, Ahmad Kalhor, Hossein Bodaghi

**Abstract:** This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data distribution axis to split these clusters incrementally into better fits if needed. Accuracy and speed are two main advantages of the proposed method. We experiment on six synthetic benchmark datasets plus two real-world datasets MNIST and Fashion-MNIST, to prove that our algorithm has excellent accuracy in finding the correct number of clusters under different conditions. We also show that k-splits is faster than similar methods and can even be faster than the standard k-means in lower dimensions. Finally, we suggest using k-splits to uncover the exact position of centroids and then input them as initial points to the k-means algorithm to fine-tune the results.

Streaming on-device detection of device directed speech from voice and touch-based invocation arxiv:2110.04656 📈 3

Ognjen Rudovic, Akanksha Bindal, Vineet Garg, Pramod Simha, Pranay Dighe, Sachin Kajarekar

**Abstract:** When interacting with smart devices such as mobile phones or wearables, the user typically invokes a virtual assistant (VA) by saying a keyword or by pressing a button on the device. However, in many cases, the VA can accidentally be invoked by the keyword-like speech or accidental button press, which may have implications on user experience and privacy. To this end, we propose an acoustic false-trigger-mitigation (FTM) approach for on-device device-directed speech detection that simultaneously handles the voice-trigger and touch-based invocation. To facilitate the model deployment on-device, we introduce a new streaming decision layer, derived using the notion of temporal convolutional networks (TCN) [1], known for their computational efficiency. To the best of our knowledge, this is the first approach that can detect device-directed speech from more than one invocation type in a streaming fashion. We compare this approach with streaming alternatives based on vanilla Average layer, and canonical LSTMs, and show: (i) that all the models show only a small degradation in accuracy compared with the invocation-specific models, and (ii) that the newly introduced streaming TCN consistently performs better or comparable with the alternatives, while mitigating device undirected speech faster in time, and with (relative) reduction in runtime peak-memory over the LSTM-based approach of 33% vs. 7%, when compared to a non-streaming counterpart.

A Framework for Rationale Extraction for Deep QA models arxiv:2110.04620 📈 3

Sahana Ramnath, Preksha Nema, Deep Sahni, Mitesh M. Khapra

**Abstract:** As neural-network-based QA models become deeper and more complex, there is a demand for robust frameworks which can access a model's rationale for its prediction. Current techniques that provide insights on a model's working are either dependent on adversarial datasets or are proposing models with explicit explanation generation components. These techniques are time-consuming and challenging to extend to existing models and new datasets. In this work, we use `Integrated Gradients' to extract rationale for existing state-of-the-art models in the task of Reading Comprehension based Question Answering (RCQA). On detailed analysis and comparison with collected human rationales, we find that though ~40-80% words of extracted rationale coincide with the human rationale (precision), only 6-19% of human rationale is present in the extracted rationale (recall).

Learning MRI Artifact Removal With Unpaired Data arxiv:2110.04604 📈 3

Siyuan Liu, Kim-Han Thung, Liangqiong Qu, Weili Lin, Dinggang Shen, Pew-Thian Yap

**Abstract:** Retrospective artifact correction (RAC) improves image quality post acquisition and enhances image usability. Recent machine learning driven techniques for RAC are predominantly based on supervised learning and therefore practical utility can be limited as data with paired artifact-free and artifact-corrupted images are typically insufficient or even non-existent. Here we show that unwanted image artifacts can be disentangled and removed from an image via an RAC neural network learned with unpaired data. This implies that our method does not require matching artifact-corrupted data to be either collected via acquisition or generated via simulation. Experimental results demonstrate that our method is remarkably effective in removing artifacts and retaining anatomical details in images with different contrasts.

Towards Data-Free Domain Generalization arxiv:2110.04545 📈 3

Ahmed Frikha, Haokun Chen, Denis Krompaß, Thomas Runkler, Volker Tresp

**Abstract:** In this work, we investigate the unexplored intersection of domain generalization and data-free learning. In particular, we address the question: How can knowledge contained in models trained on different source data domains can be merged into a single model that generalizes well to unseen target domains, in the absence of source and target domain data? Machine learning models that can cope with domain shift are essential for for real-world scenarios with often changing data distributions. Prior domain generalization methods typically rely on using source domain data, making them unsuitable for private decentralized data. We define the novel problem of Data-Free Domain Generalization (DFDG), a practical setting where models trained on the source domains separately are available instead of the original datasets, and investigate how to effectively solve the domain generalization problem in that case. We propose DEKAN, an approach that extracts and fuses domain-specific knowledge from the available teacher models into a student model robust to domain shift. Our empirical evaluation demonstrates the effectiveness of our method which achieves first state-of-the-art results in DFDG by significantly outperforming ensemble and data-free knowledge distillation baselines.

The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design arxiv:2110.04541 📈 3

Yoav Levine, Noam Wies, Daniel Jannai, Dan Navon, Yedid Hoshen, Amnon Shashua

**Abstract:** Pretraining Neural Language Models (NLMs) over a large corpus involves chunking the text into training examples, which are contiguous text segments of sizes processable by the neural architecture. We highlight a bias introduced by this common practice: we prove that the pretrained NLM can model much stronger dependencies between text segments that appeared in the same training example, than it can between text segments that appeared in different training examples. This intuitive result has a twofold role. First, it formalizes the motivation behind a broad line of recent successful NLM training heuristics, proposed for the pretraining and fine-tuning stages, which do not necessarily appear related at first glance. Second, our result clearly indicates further improvements to be made in NLM pretraining for the benefit of Natural Language Understanding tasks. As an example, we propose "kNN-Pretraining": we show that including semantically related non-neighboring sentences in the same pretraining example yields improved sentence representations and open domain question answering abilities. This theoretically motivated degree of freedom for "pretraining example design" indicates new training schemes for self-improving representations.

Colour augmentation for improved semi-supervised semantic segmentation arxiv:2110.04487 📈 3

Geoff French, Michal Mackiewicz

**Abstract:** Consistency regularization describes a class of approaches that have yielded state-of-the-art results for semi-supervised classification. While semi-supervised semantic segmentation proved to be more challenging, a number of successful approaches have been recently proposed. Recent work explored the challenges involved in using consistency regularization for segmentation problems. In their self-supervised work Chen et al. found that colour augmentation prevents a classification network from using image colour statistics as a short-cut for self-supervised learning via instance discrimination. Drawing inspiration from this we find that a similar problem impedes semi-supervised semantic segmentation and offer colour augmentation as a solution, improving semi-supervised semantic segmentation performance on challenging photographic imagery.

Visualizing the embedding space to explain the effect of knowledge distillation arxiv:2110.04483 📈 3

Hyun Seung Lee, Christian Wallraven

**Abstract:** Recent research has found that knowledge distillation can be effective in reducing the size of a network and in increasing generalization. A pre-trained, large teacher network, for example, was shown to be able to bootstrap a student model that eventually outperforms the teacher in a limited label environment. Despite these advances, it still is relatively unclear \emph{why} this method works, that is, what the resulting student model does 'better'. To address this issue, here, we utilize two non-linear, low-dimensional embedding methods (t-SNE and IVIS) to visualize representation spaces of different layers in a network. We perform a set of extensive experiments with different architecture parameters and distillation methods. The resulting visualizations and metrics clearly show that distillation guides the network to find a more compact representation space for higher accuracy already in earlier layers compared to its non-distilled version.

Predicting decision-making in the future: Human versus Machine arxiv:2110.04465 📈 3

Hoe Sung Ryu, Uijong Ju, Christian Wallraven

**Abstract:** Deep neural networks (DNNs) have become remarkably successful in data prediction, and have even been used to predict future actions based on limited input. This raises the question: do these systems actually "understand" the event similar to humans? Here, we address this issue using videos taken from an accident situation in a driving simulation. In this situation, drivers had to choose between crashing into a suddenly-appeared obstacle or steering their car off a previously indicated cliff. We compared how well humans and a DNN predicted this decision as a function of time before the event. The DNN outperformed humans for early time-points, but had an equal performance for later time-points. Interestingly, spatio-temporal image manipulations and Grad-CAM visualizations uncovered some expected behavior, but also highlighted potential differences in temporal processing for the DNN.

An Overview of Techniques for Biomarker Discovery in Voice Signal arxiv:2110.04678 📈 2

Rita Singh, Ankit Shah, Hira Dhamyal

**Abstract:** This paper reflects on the effect of several categories of medical conditions on human voice, focusing on those that may be hypothesized to have effects on voice, but for which the changes themselves may be subtle enough to have eluded observation in standard analytical examinations of the voice signal. It presents three categories of techniques that can potentially uncover such elusive biomarkers and allow them to be measured and used for predictive and diagnostic purposes. These approaches include proxy techniques, model-based analytical techniques and data-driven AI techniques.

Leveraging Experience in Lazy Search arxiv:2110.04669 📈 2

Mohak Bhardwaj, Sanjiban Choudhury, Byron Boots, Siddhartha Srinivasa

**Abstract:** Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a good edge selector chooses edges that are not only likely to be invalid, but also eliminates future paths from consideration. We wish to learn such a selector by leveraging prior experience. We formulate this problem as a Markov Decision Process (MDP) on the state of the search problem. While solving this large MDP is generally intractable, we show that we can compute oracular selectors that can solve the MDP during training. With access to such oracles, we use imitation learning to find effective policies. If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly. We evaluate our algorithms on a wide range of 2D and 7D problems and show that the learned selector outperforms baseline commonly used heuristics. We further provide a novel theoretical analysis of lazy search in a Bayesian framework as well as regret guarantees on our imitation learning based approach to motion planning.

Cognitively Inspired Learning of Incremental Drifting Concepts arxiv:2110.04662 📈 2

Mohammad Rostami, Aram Galstyan

**Abstract:** Humans continually expand their learned knowledge to new domains and learn new concepts without any interference with past learned experiences. In contrast, machine learning models perform poorly in a continual learning setting, where input data distribution changes over time. Inspired by the nervous system learning mechanisms, we develop a computational model that enables a deep neural network to learn new concepts and expand its learned knowledge to new domains incrementally in a continual learning setting. We rely on the Parallel Distributed Processing theory to encode abstract concepts in an embedding space in terms of a multimodal distribution. This embedding space is modeled by internal data representations in a hidden network layer. We also leverage the Complementary Learning Systems theory to equip the model with a memory mechanism to overcome catastrophic forgetting through implementing pseudo-rehearsal. Our model can generate pseudo-data points for experience replay and accumulate new experiences to past learned experiences without causing cross-task interference.

Exploring constraints on CycleGAN-based CBCT enhancement for adaptive radiotherapy arxiv:2110.04659 📈 2

Suraj Pai

**Abstract:** Research exploring CycleGAN-based synthetic image generation has recently accelerated in the medical community, as it is able to leverage unpaired datasets effectively. However, clinical acceptance of these synthetic images pose a significant challenge as they are subject to strict evaluation protocols. A commonly established drawback of the CycleGAN, the introduction of artifacts in generated images is unforgivable in the case of medical images. In an attempt to alleviate this drawback, we explore different constraints of the CycleGAN along with investigation of adaptive control of these constraints. The benefits of imposing additional constraints on the CycleGAN, in the form of structure retaining losses is also explored. A generalized frequency loss inspired by arxiv:2012.12821 that preserves content in the frequency domain between source and target is investigated and compared with existing losses such as the MIND loss arXiv:1809.04536. CycleGAN implementations from the ganslate framework (https://github.com/ganslate-team/ganslate) are used for experimentation in this thesis. Synthetic images generated from our methods are quantitatively and qualitatively investigated and outperform the baseline CycleGAN and other approaches. Furthermore, no observable artifacts or loss in image quality is found, which is critical for acceptance of these synthetic images. The synthetic medical images thus generated are also evaluated using domain-specific evaluation and using segmentation as a downstream task, in order to clearly highlight their applicability to clinical workflows.

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning arxiv:2110.04645 📈 2

Gen Li, Laixi Shi, Yuxin Chen, Yuantao Gu, Yuejie Chi

**Abstract:** Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with $S$ states, $A$ actions and horizon length $H$, substantial progress has been achieved towards characterizing the minimax-optimal regret, which scales on the order of $\sqrt{H^2SAT}$ (modulo log factors) with $T$ the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g., $S^6A^4 \,\mathrm{poly}(H)$ for existing model-free methods). To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity $O(SAH)$, that achieves near-optimal regret as soon as the sample size exceeds the order of $SA\,\mathrm{poly}(H)$. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves -- by at least a factor of $S^5A^3$ -- upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called {\em reference-advantage decomposition}), the proposed algorithm employs an {\em early-settled} reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings that involve intricate exploration-exploitation trade-offs.

Universal Paralinguistic Speech Representations Using Self-Supervised Conformers arxiv:2110.04621 📈 2

Joel Shor, Aren Jansen, Wei Han, Daniel Park, Yu Zhang

**Abstract:** Many speech applications require understanding aspects beyond the words being spoken, such as recognizing emotion, detecting whether the speaker is wearing a mask, or distinguishing real from synthetic speech. In this work, we introduce a new state-of-the-art paralinguistic representation derived from large-scale, fully self-supervised training of a 600M+ parameter Conformer-based architecture. We benchmark on a diverse set of speech tasks and demonstrate that simple linear classifiers trained on top of our time-averaged representation outperform nearly all previous results, in some cases by large margins. Our analyses of context-window size demonstrate that, surprisingly, 2 second context-windows achieve 98% the performance of the Conformers that use the full long-term context. Furthermore, while the best per-task representations are extracted internally in the network, stable performance across several layers allows a single universal representation to reach near optimal performance on all tasks.

A Review of Physics-based Machine Learning in Civil Engineering arxiv:2110.04600 📈 2

Shashank Reddy Vadyala, Sai Nethra Betgeri1, Dr. John C. Matthews, Dr. Elizabeth Matthews

**Abstract:** The recent development of machine learning (ML) and Deep Learning (DL) increases the opportunities in all the sectors. ML is a significant tool that can be applied across many disciplines, but its direct application to civil engineering problems can be challenging. ML for civil engineering applications that are simulated in the lab often fail in real-world tests. This is usually attributed to a data mismatch between the data used to train and test the ML model and the data it encounters in the real world, a phenomenon known as data shift. However, a physics-based ML model integrates data, partial differential equations (PDEs), and mathematical models to solve data shift problems. Physics-based ML models are trained to solve supervised learning tasks while respecting any given laws of physics described by general nonlinear equations. Physics-based ML, which takes center stage across many science disciplines, plays an important role in fluid dynamics, quantum mechanics, computational resources, and data storage. This paper reviews the history of physics-based ML and its application in civil engineering.

Human-Aware Robot Navigation via Reinforcement Learning with Hindsight Experience Replay and Curriculum Learning arxiv:2110.04564 📈 2

Keyu Li, Ye Lu, Max Q. -H. Meng

**Abstract:** In recent years, the growing demand for more intelligent service robots is pushing the development of mobile robot navigation algorithms to allow safe and efficient operation in a dense crowd. Reinforcement learning (RL) approaches have shown superior ability in solving sequential decision making problems, and recent work has explored its potential to learn navigation polices in a socially compliant manner. However, the expert demonstration data used in existing methods is usually expensive and difficult to obtain. In this work, we consider the task of training an RL agent without employing the demonstration data, to achieve efficient and collision-free navigation in a crowded environment. To address the sparse reward navigation problem, we propose to incorporate the hindsight experience replay (HER) and curriculum learning (CL) techniques with RL to efficiently learn the optimal navigation policy in the dense crowd. The effectiveness of our method is validated in a simulated crowd-robot coexisting environment. The results demonstrate that our method can effectively learn human-aware navigation without requiring additional demonstration data.

Invertible Tone Mapping with Selectable Styles arxiv:2110.04491 📈 2

Zhuming Zhang, Menghan Xia, Xueting Liu, Chengze Li, Tien-Tsin Wong

**Abstract:** Although digital cameras can acquire high-dynamic range (HDR) images, the captured HDR information are mostly quantized to low-dynamic range (LDR) images for display compatibility and compact storage. In this paper, we propose an invertible tone mapping method that converts the multi-exposure HDR to a true LDR (8-bit per color channel) and reserves the capability to accurately restore the original HDR from this {\em invertible LDR}. Our invertible LDR can mimic the appearance of a user-selected tone mapping style. It can be shared over any existing social network platforms that may re-encode or format-convert the uploaded images, without much hurting the accuracy of the restored HDR counterpart. To achieve this, we regard the tone mapping and the restoration as coupled processes, and formulate them as an encoding-and-decoding problem through convolutional neural networks. Particularly, our model supports pluggable style modulators, each of which bakes a specific tone mapping style, and thus favors the application flexibility. Our method is evaluated with a rich variety of HDR images and multiple tone mapping operators, which shows the superiority over relevant state-of-the-art methods. Also, we conduct ablation study to justify our design and discuss the robustness and generality toward real applications.

Application of quantum computing to a linear non-Gaussian acyclic model for novel medical knowledge discovery arxiv:2110.04485 📈 2

Hideaki Kawaguchi

**Abstract:** Recently, with the digitalization of medicine, the utilization of real-world medical data collected from clinical sites has been attracting attention. In this study, quantum computing was applied to a linear non-Gaussian acyclic model to discover causal relationships from real-world medical data alone. Specifically, the independence measure of DirectLiNGAM, a causal discovery algorithm, was calculated using the quantum kernel and its accuracy on real-world medical data was verified. When DirectLiNGAM with the quantum kernel (qLiNGAM) was applied to real-world medical data, a case was confirmed in which the causal structure could be correctly estimated when the amount of data was small, which was not possible with existing methods. Furthermore, qLiNGAM was implemented on real quantum hardware in an experiment using IBMQ. It is suggested that qLiNGAM may be able to discover new medical knowledge and contribute to the solution of medical problems, even when only a small amount of data is available.

Provably Efficient Black-Box Action Poisoning Attacks Against Reinforcement Learning arxiv:2110.04471 📈 2

Guanlin Liu, Lifeng Lai

**Abstract:** Due to the broad range of applications of reinforcement learning (RL), understanding the effects of adversarial attacks against RL model is essential for the safe applications of this model. Prior theoretical works on adversarial attacks against RL mainly focus on either observation poisoning attacks or environment poisoning attacks. In this paper, we introduce a new class of attacks named action poisoning attacks, where an adversary can change the action signal selected by the agent. Compared with existing attack models, the attacker's ability in the proposed action poisoning attack model is more restricted, which brings some design challenges. We study the action poisoning attack in both white-box and black-box settings. We introduce an adaptive attack scheme called LCB-H, which works for most RL agents in the black-box setting. We prove that the LCB-H attack can force any efficient RL agent, whose dynamic regret scales sublinearly with the total number of steps taken, to choose actions according to a policy selected by the attacker very frequently, with only sublinear cost. In addition, we apply LCB-H attack against a popular model-free RL algorithm: UCB-H. We show that, even in the black-box setting, by spending only logarithm cost, the proposed LCB-H attack scheme can force the UCB-H agent to choose actions according to the policy selected by the attacker very frequently.

Stability of Neural Networks on Manifolds to Relative Perturbations arxiv:2110.04702 📈 1

Zhiyang Wang, Luana Ruiz, Alejandro Ribeiro

**Abstract:** Graph Neural Networks (GNNs) show impressive performance in many practical scenarios, which can be largely attributed to their stability properties. Empirically, GNNs can scale well on large size graphs, but this is contradicted by the fact that existing stability bounds grow with the number of nodes. Graphs with well-defined limits can be seen as samples from manifolds. Hence, in this paper, we analyze the stability properties of convolutional neural networks on manifolds to understand the stability of GNNs on large graphs. Specifically, we focus on stability to relative perturbations of the Laplace-Beltrami operator. To start, we construct frequency ratio threshold filters which separate the infinite-dimensional spectrum of the Laplace-Beltrami operator. We then prove that manifold neural networks composed of these filters are stable to relative operator perturbations. As a product of this analysis, we observe that manifold neural networks exhibit a trade-off between stability and discriminability. Finally, we illustrate our results empirically in a wireless resource allocation scenario where the transmitter-receiver pairs are assumed to be sampled from a manifold.

Surrogate-Assisted Reference Vector Adaptation to Various Pareto Front Shapes for Many-Objective Bayesian Optimization arxiv:2110.04689 📈 1

Nobuo Namura

**Abstract:** We propose a surrogate-assisted reference vector adaptation (SRVA) method to solve expensive multi- and many-objective optimization problems with various Pareto front shapes. SRVA is coupled with a multi-objective Bayesian optimization (MBO) algorithm using reference vectors for scalarization of objective functions. The Kriging surrogate models for MBO is used to estimate the Pareto front shape and generate adaptive reference vectors uniformly distributed on the estimated Pareto front. We combine SRVA with expected improvement of penalty-based boundary intersection as an infill criterion for MBO. The proposed algorithm is compared with two other MBO algorithms by applying them to benchmark problems with various Pareto front shapes. Experimental results show that the proposed algorithm outperforms the other two in the problems whose objective functions are reasonably approximated by the Kriging models. SRVA improves diversity of non-dominated solutions for these problems with continuous, discontinuous, and degenerated Pareto fronts. Besides, the proposed algorithm obtains much better solutions from early stages of optimization especially in many-objective problems.

Learning to Control Complex Robots Using High-Dimensional Interfaces: Preliminary Insights arxiv:2110.04663 📈 1

Jongmin M. Lee, Temesgen Gebrekristos, Dalia De Santis, Mahdieh Nejati-Javaremi, Deepak Gopinath, Biraj Parikh, Ferdinando A. Mussa-Ivaldi, Brenna D. Argall

**Abstract:** Human body motions can be captured as a high-dimensional continuous signal using motion sensor technologies. The resulting data can be surprisingly rich in information, even when captured from persons with limited mobility. In this work, we explore the use of limited upper-body motions, captured via motion sensors, as inputs to control a 7 degree-of-freedom assistive robotic arm. It is possible that even dense sensor signals lack the salient information and independence necessary for reliable high-dimensional robot control. As the human learns over time in the context of this limitation, intelligence on the robot can be leveraged to better identify key learning challenges, provide useful feedback, and support individuals until the challenges are managed. In this short paper, we examine two uninjured participants' data from an ongoing study, to extract preliminary results and share insights. We observe opportunities for robot intelligence to step in, including the identification of inconsistencies in time spent across all control dimensions, asymmetries in individual control dimensions, and user progress in learning. Machine reasoning about these situations may facilitate novel interface learning in the future.

Topological Data Analysis (TDA) Techniques Enhance Hand Pose Classification from ECoG Neural Recordings arxiv:2110.04653 📈 1

Simone Azeglio, Arianna Di Bernardo, Gabriele Penna, Fabrizio Pittatore, Simone Poetto, Johannes Gruenwald, Christoph Kapeller, Kyousuke Kamada, Christoph Guger

**Abstract:** Electrocorticogram (ECoG) well characterizes hand movement intentions and gestures. In the present work we aim to investigate the possibility to enhance hand pose classification, in a Rock-Paper-Scissor - and Rest - task, by introducing topological descriptors of time series data. We hypothesized that an innovative approach based on topological data analysis can extract hidden information that are not detectable with standard Brain Computer Interface (BCI)techniques. To investigate this hypothesis, we integrate topological features together with power band features and feed them to several standard classifiers, e.g. Random Forest,Gradient Boosting. Model selection is thus completed after a meticulous phase of bayesian hyperparameter optimization. With our method, we observed robust results in terms of ac-curacy for a four-labels classification problem, with limited available data. Through feature importance investigation, we conclude that topological descriptors are able to extract useful discriminative information and provide novel insights.Since our data are restricted to single-patient recordings, generalization might be limited. Nevertheless, our method can be extended and applied to a wide range of neurophysiological recordings and it might be an intriguing point of departure for future studies.

An Independent Learning Algorithm for a Class of Symmetric Stochastic Games arxiv:2110.04638 📈 1

Bora Yongacoglu, Gürdal Arslan, Serdar Yüksel

**Abstract:** In multi-agent reinforcement learning, independent learners are those that do not access the action selections of other learning agents in the system. This paper investigates the feasibility of using independent learners to find approximate equilibrium policies in non-episodic, discounted stochastic games. We define a property, here called the $ε$-revision paths property, and prove that a class of games exhibiting symmetry among the players has this property for any $ε\geq 0$. Building on this result, we present an independent learning algorithm that comes with high probability guarantees of approximate equilibrium in this class of games. This guarantee is made assuming symmetry alone, without additional assumptions such as a zero sum, team, or potential game structure.

Teaching Robots to Grasp Like Humans: An Interactive Approach arxiv:2110.04534 📈 1

Anna Mészáros, Giovanni Franzese, Jens Kober

**Abstract:** This work investigates how the intricate task of grasping may be learned from humans based on demonstrations and corrections. Due to the complexity of the task, these demonstrations are often slow and even slightly flawed, particularly at moments when multiple aspects (i.e., end-effector movement, orientation, and gripper width) have to be demonstrated at once. Rather than training a person to provide better demonstrations, non-expert users are provided with the ability to interactively modify the dynamics of their initial demonstration through teleoperated corrective feedback. This in turn allows them to teach motions outside of their own physical capabilities. In the end, the goal is to obtain a faster but reliable execution of the task. The presented framework learns the desired movement dynamics based on the current Cartesian Position with Gaussian Processes (GP), resulting in a reactive, time-invariant policy. Using GPs also allows online interactive corrections and active disturbance rejection through epistemic uncertainty minimization. The experimental evaluation of the framework is carried out on a Franka-Emika Panda.

An Empirical Study on Compressed Decentralized Stochastic Gradient Algorithms with Overparameterized Models arxiv:2110.04523 📈 1

Arjun Ashok Rao, Hoi-To Wai

**Abstract:** This paper considers decentralized optimization with application to machine learning on graphs. The growing size of neural network (NN) models has motivated prior works on decentralized stochastic gradient algorithms to incorporate communication compression. On the other hand, recent works have demonstrated the favorable convergence and generalization properties of overparameterized NNs. In this work, we present an empirical analysis on the performance of compressed decentralized stochastic gradient (DSG) algorithms with overparameterized NNs. Through simulations on an MPI network environment, we observe that the convergence rates of popular compressed DSG algorithms are robust to the size of NNs. Our findings suggest a gap between theories and practice of the compressed DSG algorithms in the existing literature.

Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models arxiv:2110.04478 📈 1

Saeed Rashidi, William Won, Sudarshan Srinivasan, Srinivas Sridharan, Tushar Krishna

**Abstract:** The continuous growth in both size and training data for modern Deep Neural Networks (DNNs) models has led to training tasks taking days or even months. Distributed training is a solution to reduce training time by splitting the task across multiple NPUs (e.g., GPU/TPU). However, distributed training adds communication overhead between the NPUs in order to synchronize the gradients and/or activation, depending on the parallelization strategy. In today's datacenters, for training at scale, NPUs are connected through multi-dimensional interconnection links with different bandwidth and latency. Hence, keeping all network dimensions busy and maximizing the network BW is a challenging task in such a hybrid network environment, as this work identifies. We propose Themis, a novel collective scheduling scheme that dynamically schedules collectives (divided into chunks) to balance the communication loads across all dimensions, further improving the network BW utilization. Our results show that on average, Themis can improve the network BW utilization of single All-Reduce by 1.88x (2.92x max), and improve the end-to-end training iteration performance of real workloads such as ResNet-50, GNMT, DLRM, and Transformer- 1T by 1.49x (1.96x max), 1.41x (1.81x max), 1.42x (1.80x max), and 1.35x (1.78x max), respectively.

Self-adaptive Multi-task Particle Swarm Optimization arxiv:2110.04473 📈 1

Xiaolong Zheng, Deyun Zhou, Na Li, Yu Lei, Tao Wu, Maoguo Gong

**Abstract:** Multi-task optimization (MTO) studies how to simultaneously solve multiple optimization problems for the purpose of obtaining better performance on each problem. Over the past few years, evolutionary MTO (EMTO) was proposed to handle MTO problems via evolutionary algorithms. So far, many EMTO algorithms have been developed and demonstrated well performance on solving real-world problems. However, there remain many works to do in adapting knowledge transfer to task relatedness in EMTO. Different from the existing works, we develop a self-adaptive multi-task particle swarm optimization (SaMTPSO) through the developed knowledge transfer adaptation strategy, the focus search strategy and the knowledge incorporation strategy. In the knowledge transfer adaptation strategy, each task has a knowledge source pool that consists of all knowledge sources. Each source (task) outputs knowledge to the task. And knowledge transfer adapts to task relatedness via individuals' choice on different sources of a pool, where the chosen probabilities for different sources are computed respectively according to task's success rate in generating improved solutions via these sources. In the focus search strategy, if there is no knowledge source benefit the optimization of a task, then all knowledge sources in the task's pool are forbidden to be utilized except the task, which helps to improve the performance of the proposed algorithm. Note that the task itself is as a knowledge source of its own. In the knowledge incorporation strategy, two different forms are developed to help the SaMTPSO explore and exploit the transferred knowledge from a chosen source, each leading to a version of the SaMTPSO. Several experiments are conducted on two test suites. The results of the SaMTPSO are comparing to that of 3 popular EMTO algorithms and a particle swarm algorithm, which demonstrates the superiority of the SaMTPSO.

ProductAE: Towards Training Larger Channel Codes based on Neural Product Codes arxiv:2110.04466 📈 1

Mohammad Vahid Jamali, Hamid Saber, Homayoon Hatami, Jung Hyun Bae

**Abstract:** There have been significant research activities in recent years to automate the design of channel encoders and decoders via deep learning. Due the dimensionality challenge in channel coding, it is prohibitively complex to design and train relatively large neural channel codes via deep learning techniques. Consequently, most of the results in the literature are limited to relatively short codes having less than 100 information bits. In this paper, we construct ProductAEs, a computationally efficient family of deep-learning driven (encoder, decoder) pairs, that aim at enabling the training of relatively large channel codes (both encoders and decoders) with a manageable training complexity. We build upon the ideas from classical product codes, and propose constructing large neural codes using smaller code components. More specifically, instead of directly training the encoder and decoder for a large neural code of dimension $k$ and blocklength $n$, we provide a framework that requires training neural encoders and decoders for the code parameters $(k_1,n_1)$ and $(k_2,n_2)$ such that $k_1 k_2=k$ and $n_1 n_2=n$. Our training results show significant gains, over all ranges of signal-to-noise ratio (SNR), for a code of parameters $(100,225)$ and a moderate-length code of parameters $(196,441)$, over polar codes under successive cancellation (SC) decoder. Moreover, our results demonstrate meaningful gains over Turbo Autoencoder (TurboAE) and state-of-the-art classical codes. This is the first work to design product autoencoders and a pioneering work on training large channel codes.

Deep Joint Source-Channel Coding for Wireless Image Transmission with Adaptive Rate Control arxiv:2110.04456 📈 1

Mingyu Yang, Hun-Seok Kim

**Abstract:** We present a novel adaptive deep joint source-channel coding (JSCC) scheme for wireless image transmission. The proposed scheme supports multiple rates using a single deep neural network (DNN) model and learns to dynamically control the rate based on the channel condition and image contents. Specifically, a policy network is introduced to exploit the tradeoff space between the rate and signal quality. To train the policy network, the Gumbel-Softmax trick is adopted to make the policy network differentiable and hence the whole JSCC scheme can be trained end-to-end. To the best of our knowledge, this is the first deep JSCC scheme that can automatically adjust its rate using a single network model. Experiments show that our scheme successfully learns a reasonable policy that decreases channel bandwidth utilization for high SNR scenarios or simple image contents. For an arbitrary target rate, our rate-adaptive scheme using a single model achieves similar performance compared to an optimized model specifically trained for that fixed target rate. To reproduce our results, we make the source code publicly available at https://github.com/mingyuyng/Dynamic_JSCC.

Next Page