Summary for 2021-03-03, created on 2021-12-22

Minimum-Distortion Embedding arxiv:2103.02559 📈 172

Akshay Agrawal, Alnur Ali, Stephen Boyd

**Abstract:** We consider the vector embedding problem. We are given a finite set of items, with the goal of assigning a representative vector to each one, possibly under some constraints (such as the collection of vectors being standardized, i.e., having zero mean and unit covariance). We are given data indicating that some pairs of items are similar, and optionally, some other pairs are dissimilar. For pairs of similar items, we want the corresponding vectors to be near each other, and for dissimilar pairs, we want the corresponding vectors to not be near each other, measured in Euclidean distance. We formalize this by introducing distortion functions, defined for some pairs of the items. Our goal is to choose an embedding that minimizes the total distortion, subject to the constraints. We call this the minimum-distortion embedding (MDE) problem. The MDE framework is simple but general. It includes a wide variety of embedding methods, such as spectral embedding, principal component analysis, multidimensional scaling, dimensionality reduction methods (like Isomap and UMAP), force-directed layout, and others. It also includes new embeddings, and provides principled ways of validating historical and new embeddings alike. We develop a projected quasi-Newton method that approximately solves MDE problems and scales to large data sets. We implement this method in PyMDE, an open-source Python package. In PyMDE, users can select from a library of distortion functions and constraints or specify custom ones, making it easy to rapidly experiment with different embeddings. Our software scales to data sets with millions of items and tens of millions of distortion functions. To demonstrate our method, we compute embeddings for several real-world data sets, including images, an academic co-author network, US county demographic data, and single-cell mRNA transcriptomes.

D'ya like DAGs? A Survey on Structure Learning and Causal Discovery arxiv:2103.02582 📈 92

Matthew J. Vowels, Necati Cihan Camgoz, Richard Bowden

**Abstract:** Causal reasoning is a crucial part of science and human intelligence. In order to discover causal relationships from data, we need structure discovery methods. We provide a review of background theory and a survey of methods for structure discovery. We primarily focus on modern, continuous optimization methods, and provide reference to further resources such as benchmark datasets and software packages. Finally, we discuss the assumptive leap required to take us from structure to causality.

Towards Open World Object Detection arxiv:2103.02603 📈 87

K J Joseph, Salman Khan, Fahad Shahbaz Khan, Vineeth N Balasubramanian

**Abstract:** Humans have a natural instinct to identify unknown object instances in their environments. The intrinsic curiosity about these unknown instances aids in learning about them, when the corresponding knowledge is eventually available. This motivates us to propose a novel computer vision problem called: `Open World Object Detection', where a model is tasked to: 1) identify objects that have not been introduced to it as `unknown', without explicit supervision to do so, and 2) incrementally learn these identified unknown categories without forgetting previously learned classes, when the corresponding labels are progressively received. We formulate the problem, introduce a strong evaluation protocol and provide a novel solution, which we call ORE: Open World Object Detector, based on contrastive clustering and energy based unknown identification. Our experimental evaluation and ablation studies analyze the efficacy of ORE in achieving Open World objectives. As an interesting by-product, we find that identifying and characterizing unknown instances helps to reduce confusion in an incremental object detection setting, where we achieve state-of-the-art performance, with no extra methodological effort. We hope that our work will attract further research into this newly identified, yet crucial research direction.

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning arxiv:2103.02193 📈 55

Abulikemu Abuduweili, Xingjian Li, Humphrey Shi, Cheng-Zhong Xu, Dejing Dou

**Abstract:** While recent studies on semi-supervised learning have shown remarkable progress in leveraging both labeled and unlabeled data, most of them presume a basic setting of the model is randomly initialized. In this work, we consider semi-supervised learning and transfer learning jointly, leading to a more practical and competitive paradigm that can utilize both powerful pre-trained models from source domain as well as labeled/unlabeled data in the target domain. To better exploit the value of both pre-trained weights and unlabeled target examples, we introduce adaptive consistency regularization that consists of two complementary components: Adaptive Knowledge Consistency (AKC) on the examples between the source and target model, and Adaptive Representation Consistency (ARC) on the target model between labeled and unlabeled examples. Examples involved in the consistency regularization are adaptively selected according to their potential contributions to the target task. We conduct extensive experiments on popular benchmarks including CIFAR-10, CUB-200, and MURA, by fine-tuning the ImageNet pre-trained ResNet-50 model. Results show that our proposed adaptive consistency regularization outperforms state-of-the-art semi-supervised learning techniques such as Pseudo Label, Mean Teacher, and FixMatch. Moreover, our algorithm is orthogonal to existing methods and thus able to gain additional improvements on top of MixMatch and FixMatch. Our code is available at https://github.com/SHI-Labs/Semi-Supervised-Transfer-Learning.

Out of Distribution Generalization in Machine Learning arxiv:2103.02667 📈 49

Martin Arjovsky

**Abstract:** Machine learning has achieved tremendous success in a variety of domains in recent years. However, a lot of these success stories have been in places where the training and the testing distributions are extremely similar to each other. In everyday situations when models are tested in slightly different data than they were trained on, ML algorithms can fail spectacularly. This research attempts to formally define this problem, what sets of assumptions are reasonable to make in our data and what kind of guarantees we hope to obtain from them. Then, we focus on a certain class of out of distribution problems, their assumptions, and introduce simple algorithms that follow from these assumptions that are able to provide more reliable generalization. A central topic in the thesis is the strong link between discovering the causal structure of the data, finding features that are reliable (when using them to predict) regardless of their context, and out of distribution generalization.

Learning the Next Best View for 3D Point Clouds via Topological Features arxiv:2103.02789 📈 28

Christopher Collander, William J. Beksi, Manfred Huber

**Abstract:** In this paper, we introduce a reinforcement learning approach utilizing a novel topology-based information gain metric for directing the next best view of a noisy 3D sensor. The metric combines the disjoint sections of an observed surface to focus on high-detail features such as holes and concave sections. Experimental results show that our approach can aid in establishing the placement of a robotic sensor to optimize the information provided by its streaming point cloud data. Furthermore, a labeled dataset of 3D objects, a CAD design for a custom robotic manipulator, and software for the transformation, union, and registration of point clouds has been publicly released to the research community.

COIN: COmpression with Implicit Neural representations arxiv:2103.03123 📈 22

Emilien Dupont, Adam Goliński, Milad Alizadeh, Yee Whye Teh, Arnaud Doucet

**Abstract:** We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches.

Energy-Based Learning for Scene Graph Generation arxiv:2103.02221 📈 14

Mohammed Suhail, Abhay Mittal, Behjat Siddiquie, Chris Broaddus, Jayan Eledath, Gerard Medioni, Leonid Sigal

**Abstract:** Traditional scene graph generation methods are trained using cross-entropy losses that treat objects and relationships as independent entities. Such a formulation, however, ignores the structure in the output space, in an inherently structured prediction problem. In this work, we introduce a novel energy-based learning framework for generating scene graphs. The proposed formulation allows for efficiently incorporating the structure of scene graphs in the output space. This additional constraint in the learning framework acts as an inductive bias and allows models to learn efficiently from a small number of labels. We use the proposed energy-based framework to train existing state-of-the-art models and obtain a significant performance improvement, of up to 21% and 27%, on the Visual Genome and GQA benchmark datasets, respectively. Furthermore, we showcase the learning efficiency of the proposed framework by demonstrating superior performance in the zero- and few-shot settings where data is scarce.

Deep Recurrent Encoder: A scalable end-to-end network to model brain signals arxiv:2103.02339 📈 10

Omar Chehab, Alexandre Defossez, Jean-Christophe Loiseau, Alexandre Gramfort, Jean-Remi King

**Abstract:** Understanding how the brain responds to sensory inputs is challenging: brain recordings are partial, noisy, and high dimensional; they vary across sessions and subjects and they capture highly nonlinear dynamics. These challenges have led the community to develop a variety of preprocessing and analytical (almost exclusively linear) methods, each designed to tackle one of these issues. Instead, we propose to address these challenges through a specific end-to-end deep learning architecture, trained to predict the brain responses of multiple subjects at once. We successfully test this approach on a large cohort of magnetoencephalography (MEG) recordings acquired during a one-hour reading task. Our Deep Recurrent Encoding (DRE) architecture reliably predicts MEG responses to words with a three-fold improvement over classic linear methods. To overcome the notorious issue of interpretability of deep learning, we describe a simple variable importance analysis. When applied to DRE, this method recovers the expected evoked responses to word length and word frequency. The quantitative improvement of the present deep learning approach paves the way to better understand the nonlinear dynamics of brain activity from large datasets.

Regularizing towards Causal Invariance: Linear Models with Proxies arxiv:2103.02477 📈 9

Michael Oberst, Nikolaj Thams, Jonas Peters, David Sontag

**Abstract:** We propose a method for learning linear models whose predictive performance is robust to causal interventions on unobserved variables, when noisy proxies of those variables are available. Our approach takes the form of a regularization term that trades off between in-distribution performance and robustness to interventions. Under the assumption of a linear structural causal model, we show that a single proxy can be used to create estimators that are prediction optimal under interventions of bounded strength. This strength depends on the magnitude of the measurement noise in the proxy, which is, in general, not identifiable. In the case of two proxy variables, we propose a modified estimator that is prediction optimal under interventions up to a known strength. We further show how to extend these estimators to scenarios where additional information about the "test time" intervention is available during training. We evaluate our theoretical findings in synthetic experiments and using real data of hourly pollution levels across several cities in China.

On the effectiveness of adversarial training against common corruptions arxiv:2103.02325 📈 9

Klim Kireev, Maksym Andriushchenko, Nicolas Flammarion

**Abstract:** The literature on robustness towards common corruptions shows no consensus on whether adversarial training can improve the performance in this setting. First, we show that, when used with an appropriately selected perturbation radius, $\ell_p$ adversarial training can serve as a strong baseline against common corruptions. Then we explain why adversarial training performs better than data augmentation with simple Gaussian noise which has been observed to be a meaningful baseline on common corruptions. Related to this, we identify the $σ$-overfitting phenomenon when Gaussian augmentation overfits to a particular standard deviation used for training which has a significant detrimental effect on common corruption accuracy. We discuss how to alleviate this problem and then how to further enhance $\ell_p$ adversarial training by introducing an efficient relaxation of adversarial training with learned perceptual image patch similarity as the distance metric. Through experiments on CIFAR-10 and ImageNet-100, we show that our approach does not only improve the $\ell_p$ adversarial training baseline but also has cumulative gains with data augmentation methods such as AugMix, ANT, and SIN leading to state-of-the-art performance on common corruptions. The code of our experiments is publicly available at https://github.com/tml-epfl/adv-training-corruptions.

Meta-Curriculum Learning for Domain Adaptation in Neural Machine Translation arxiv:2103.02262 📈 9

Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

**Abstract:** Meta-learning has been sufficiently validated to be beneficial for low-resource neural machine translation (NMT). However, we find that meta-trained NMT fails to improve the translation performance of the domain unseen at the meta-training stage. In this paper, we aim to alleviate this issue by proposing a novel meta-curriculum learning for domain adaptation in NMT. During meta-training, the NMT first learns the similar curricula from each domain to avoid falling into a bad local optimum early, and finally learns the curricula of individualities to improve the model robustness for learning domain-specific knowledge. Experimental results on 10 different low-resource domains show that meta-curriculum learning can improve the translation performance of both familiar and unfamiliar domains. All the codes and data are freely available at https://github.com/NLP2CT/Meta-Curriculum.

Compute and memory efficient universal sound source separation arxiv:2103.02644 📈 8

Efthymios Tzinis, Zhepei Wang, Xilin Jiang, Paris Smaragdis

**Abstract:** Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem. In this study, we provide a family of efficient neural network architectures for general purpose audio source separation while focusing on multiple computational aspects that hinder the application of neural networks in real-world scenarios. The backbone structure of this convolutional network is the SUccessive DOwnsampling and Resampling of Multi-Resolution Features (SuDoRM-RF) as well as their aggregation which is performed through simple one-dimensional convolutions. This mechanism enables our models to obtain high fidelity signal separation in a wide variety of settings where variable number of sources are present and with limited computational resources (e.g. floating point operations, memory footprint, number of parameters and latency). Our experiments show that SuDoRM-RF models perform comparably and even surpass several state-of-the-art benchmarks with significantly higher computational resource requirements. The causal variation of SuDoRM-RF is able to obtain competitive performance in real-time speech separation of around 10dB scale-invariant signal-to-distortion ratio improvement (SI-SDRi) while remaining up to 20 times faster than real-time on a laptop device.

Deep Adaptive Design: Amortizing Sequential Bayesian Experimental Design arxiv:2103.02438 📈 7

Adam Foster, Desi R. Ivanova, Ilyas Malik, Tom Rainforth

**Abstract:** We introduce Deep Adaptive Design (DAD), a method for amortizing the cost of adaptive Bayesian experimental design that allows experiments to be run in real-time. Traditional sequential Bayesian optimal experimental design approaches require substantial computation at each stage of the experiment. This makes them unsuitable for most real-world applications, where decisions must typically be made quickly. DAD addresses this restriction by learning an amortized design network upfront and then using this to rapidly run (multiple) adaptive experiments at deployment time. This network represents a design policy which takes as input the data from previous steps, and outputs the next design using a single forward pass; these design decisions can be made in milliseconds during the live experiment. To train the network, we introduce contrastive information bounds that are suitable objectives for the sequential setting, and propose a customized network architecture that exploits key symmetries. We demonstrate that DAD successfully amortizes the process of experimental design, outperforming alternative strategies on a number of problems.

Relate and Predict: Structure-Aware Prediction with Jointly Optimized Neural DAG arxiv:2103.02405 📈 7

Arshdeep Sekhon, Zhe Wang, Yanjun Qi

**Abstract:** Understanding relationships between feature variables is one important way humans use to make decisions. However, state-of-the-art deep learning studies either focus on task-agnostic statistical dependency learning or do not model explicit feature dependencies during prediction. We propose a deep neural network framework, dGAP, to learn neural dependency Graph and optimize structure-Aware target Prediction simultaneously. dGAP trains towards a structure self-supervision loss and a target prediction loss jointly. Our method leads to an interpretable model that can disentangle sparse feature relationships, informing the user how relevant dependencies impact the target task. We empirically evaluate dGAP on multiple simulated and real datasets. dGAP is not only more accurate, but can also recover correct dependency structure.

Event-based Synthetic Aperture Imaging with a Hybrid Network arxiv:2103.02376 📈 7

Xiang Zhang, Wei Liao, Lei Yu, Wen Yang, Gui-Song Xia

**Abstract:** Synthetic aperture imaging (SAI) is able to achieve the see through effect by blurring out the off-focus foreground occlusions and reconstructing the in-focus occluded targets from multi-view images. However, very dense occlusions and extreme lighting conditions may bring significant disturbances to the SAI based on conventional frame-based cameras, leading to performance degeneration. To address these problems, we propose a novel SAI system based on the event camera which can produce asynchronous events with extremely low latency and high dynamic range. Thus, it can eliminate the interference of dense occlusions by measuring with almost continuous views, and simultaneously tackle the over/under exposure problems. To reconstruct the occluded targets, we propose a hybrid encoder-decoder network composed of spiking neural networks (SNNs) and convolutional neural networks (CNNs). In the hybrid network, the spatio-temporal information of the collected events is first encoded by SNN layers, and then transformed to the visual image of the occluded targets by a style-transfer CNN decoder. Through experiments, the proposed method shows remarkable performance in dealing with very dense occlusions and extreme lighting conditions, and high quality visual images can be reconstructed using pure event data.

Task Aligned Generative Meta-learning for Zero-shot Learning arxiv:2103.02185 📈 7

Zhe Liu, Yun Li, Lina Yao, Xianzhi Wang, Guodong Long

**Abstract:** Zero-shot learning (ZSL) refers to the problem of learning to classify instances from the novel classes (unseen) that are absent in the training set (seen). Most ZSL methods infer the correlation between visual features and attributes to train the classifier for unseen classes. However, such models may have a strong bias towards seen classes during training. Meta-learning has been introduced to mitigate the basis, but meta-ZSL methods are inapplicable when tasks used for training are sampled from diverse distributions. In this regard, we propose a novel Task-aligned Generative Meta-learning model for Zero-shot learning (TGMZ). TGMZ mitigates the potentially biased training and enables meta-ZSL to accommodate real-world datasets containing diverse distributions. TGMZ incorporates an attribute-conditioned task-wise distribution alignment network that projects tasks into a unified distribution to deliver an unbiased model. Our comparisons with state-of-the-art algorithms show the improvements of 2.1%, 3.0%, 2.5%, and 7.6% achieved by TGMZ on AWA1, AWA2, CUB, and aPY datasets, respectively. TGMZ also outperforms competitors by 3.6% in generalized zero-shot learning (GZSL) setting and 7.9% in our proposed fusion-ZSL setting.

Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation arxiv:2103.02451 📈 6

Hemang Chawla, Arnav Varma, Elahe Arani, Bahram Zonooz

**Abstract:** Dense depth estimation is essential to scene-understanding for autonomous driving. However, recent self-supervised approaches on monocular videos suffer from scale-inconsistency across long sequences. Utilizing data from the ubiquitously copresent global positioning systems (GPS), we tackle this challenge by proposing a dynamically-weighted GPS-to-Scale (g2s) loss to complement the appearance-based losses. We emphasize that the GPS is needed only during the multimodal training, and not at inference. The relative distance between frames captured through the GPS provides a scale signal that is independent of the camera setup and scene distribution, resulting in richer learned feature representations. Through extensive evaluation on multiple datasets, we demonstrate scale-consistent and -aware depth estimation during inference, improving the performance even when training with low-frequency GPS data.

Reinforcement Learning for Orientation Estimation Using Inertial Sensors with Performance Guarantee arxiv:2103.02357 📈 6

Liang Hu, Yujie Tang, Zhipeng Zhou, Wei Pan

**Abstract:** This paper presents a deep reinforcement learning (DRL) algorithm for orientation estimation using inertial sensors combined with magnetometer. The Lyapunov method in control theory is employed to prove the convergence of orientation estimation errors. Based on the theoretical results, the estimator gains and a Lyapunov function are parametrized by deep neural networks and learned from samples. The DRL estimator is compared with three well-known orientation estimation methods on both numerical simulations and real datasets collected from commercially available sensors. The results show that the proposed algorithm is superior for arbitrary estimation initialization and can adapt to very large angular velocities for which other algorithms can be hardly applicable. To the best of our knowledge, this is the first DRL-based orientation estimation method with estimation error boundedness guarantee.

Machine Learning using Stata/Python arxiv:2103.03122 📈 5

Giovanni Cerulli

**Abstract:** We present two related Stata modules, r_ml_stata and c_ml_stata, for fitting popular Machine Learning (ML) methods both in regression and classification settings. Using the recent Stata/Python integration platform (sfi) of Stata 16, these commands provide hyper-parameters' optimal tuning via K-fold cross-validation using greed search. More specifically, they make use of the Python Scikit-learn API to carry out both cross-validation and outcome/label prediction.

SVMax: A Feature Embedding Regularizer arxiv:2103.02770 📈 5

Ahmed Taha, Alex Hanson, Abhinav Shrivastava, Larry Davis

**Abstract:** A neural network regularizer (e.g., weight decay) boosts performance by explicitly penalizing the complexity of a network. In this paper, we penalize inferior network activations -- feature embeddings -- which in turn regularize the network's weights implicitly. We propose singular value maximization (SVMax) to learn a more uniform feature embedding. The SVMax regularizer supports both supervised and unsupervised learning. Our formulation mitigates model collapse and enables larger learning rates. We evaluate the SVMax regularizer using both retrieval and generative adversarial networks. We leverage a synthetic mixture of Gaussians dataset to evaluate SVMax in an unsupervised setting. For retrieval networks, SVMax achieves significant improvement margins across various ranking losses. Code available at https://bit.ly/3jNkgDt

Domain Generalization in Vision: A Survey arxiv:2103.02503 📈 5

Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, Chen Change Loy

**Abstract:** Generalization to out-of-distribution (OOD) data is a capability natural to humans yet challenging for machines to reproduce. This is because most learning algorithms strongly rely on the i.i.d.~assumption on source/target data, which is often violated in practice due to domain shift. Domain generalization (DG) aims to achieve OOD generalization by using only source data for model learning. Since first introduced in 2011, research in DG has made great progresses. In particular, intensive research in this topic has led to a broad spectrum of methodologies, e.g., those based on domain alignment, meta-learning, data augmentation, or ensemble learning, just to name a few; and has covered various vision applications such as object recognition, segmentation, action recognition, and person re-identification. In this paper, for the first time a comprehensive literature review is provided to summarize the developments in DG for computer vision over the past decade. Specifically, we first cover the background by formally defining DG and relating it to other research fields like domain adaptation and transfer learning. Second, we conduct a thorough review into existing methods and present a categorization based on their methodologies and motivations. Finally, we conclude this survey with insights and discussions on future research directions.

Multi-view Audio and Music Classification arxiv:2103.02420 📈 5

Huy Phan, Huy Le Nguyen, Oliver Y. Chén, Lam Pham, Philipp Koch, Ian McLoughlin, Alfred Mertins

**Abstract:** We propose in this work a multi-view learning approach for audio and music classification. Considering four typical low-level representations (i.e. different views) commonly used for audio and music recognition tasks, the proposed multi-view network consists of four subnetworks, each handling one input types. The learned embedding in the subnetworks are then concatenated to form the multi-view embedding for classification similar to a simple concatenation network. However, apart from the joint classification branch, the network also maintains four classification branches on the single-view embedding of the subnetworks. A novel method is then proposed to keep track of the learning behavior on the classification branches and adapt their weights to proportionally blend their gradients for network training. The weights are adapted in such a way that learning on a branch that is generalizing well will be encouraged whereas learning on a branch that is overfitting will be slowed down. Experiments on three different audio and music classification tasks show that the proposed multi-view network not only outperforms the single-view baselines but also is superior to the multi-view baselines based on concatenation and late fusion.

Bulk Production Augmentation Towards Explainable Melanoma Diagnosis arxiv:2103.02198 📈 5

Kasumi Obi, Quan Huu Cap, Noriko Umegaki-Arao, Masaru Tanaka, Hitoshi Iyatomi

**Abstract:** Although highly accurate automated diagnostic techniques for melanoma have been reported, the realization of a system capable of providing diagnostic evidence based on medical indices remains an open issue because of difficulties in obtaining reliable training data. In this paper, we propose bulk production augmentation (BPA) to generate high-quality, diverse pseudo-skin tumor images with the desired structural malignant features for additional training images from a limited number of labeled images. The proposed BPA acts as an effective data augmentation in constructing the feature detector for the atypical pigment network (APN), which is a key structure in melanoma diagnosis. Experiments show that training with images generated by our BPA largely boosts the APN detection performance by 20.0 percentage points in the area under the receiver operating characteristic curve, which is 11.5 to 13.7 points higher than that of conventional CycleGAN-based augmentations in AUC.

Stay on Topic, Please: Aligning User Comments to the Content of a News Article arxiv:2103.06130 📈 4

Jumanah Alshehri, Marija Stanojevic, Eduard Dragut, Zoran Obradovic

**Abstract:** Social scientists have shown that up to 50% if the content posted to a news article have no relation to its journalistic content. In this study we propose a classification algorithm to categorize user comments posted to a new article base don their alignment to its content. The alignment seek to match user comments to an article based on similarity off content, entities in discussion, and topic. We proposed a BERTAC, BAERT-based approach that learn jointly article-comment embeddings and infers the relevance class of comments. We introduce an ordinal classification loss that penalizes the difference between the predicted and true label. We conduct a thorough study to show influence of the proposed loss on the learning process. The results on five representative news outlets show that our approach can learn the comment class with up to 36% average accuracy improvement compering to the baselines, and up to 25% compering to the BA-BC model. BA-BC is out approach that consists of two models aimed to capture dis-jointly the formal language of news articles and the informal language of comments. We also conduct a user study to evaluate human labeling performance to understand the difficulty of the classification task. The user agreement on comment-article alignment is "moderate" per Krippendorff's alpha score, which suggests that the classification task is difficult.

Learning Invariant Representations across Domains and Tasks arxiv:2103.05114 📈 4

Jindong Wang, Wenjie Feng, Chang Liu, Chaohui Yu, Mingxuan Du, Renjun Xu, Tao Qin, Tie-Yan Liu

**Abstract:** Being expensive and time-consuming to collect massive COVID-19 image samples to train deep classification models, transfer learning is a promising approach by transferring knowledge from the abundant typical pneumonia datasets for COVID-19 image classification. However, negative transfer may deteriorate the performance due to the feature distribution divergence between two datasets and task semantic difference in diagnosing pneumonia and COVID-19 that rely on different characteristics. It is even more challenging when the target dataset has no labels available, i.e., unsupervised task transfer learning. In this paper, we propose a novel Task Adaptation Network (TAN) to solve this unsupervised task transfer problem. In addition to learning transferable features via domain-adversarial training, we propose a novel task semantic adaptor that uses the learning-to-learn strategy to adapt the task semantics. Experiments on three public COVID-19 datasets demonstrate that our proposed method achieves superior performance. Especially on COVID-DA dataset, TAN significantly increases the recall and F1 score by 5.0% and 7.8% compared to recently strong baselines. Moreover, we show that TAN also achieves superior performance on several public domain adaptation benchmarks.

A Robust Adversarial Network-Based End-to-End Communications System With Strong Generalization Ability Against Adversarial Attacks arxiv:2103.02654 📈 4

Yudi Dong, Huaxia Wang, Yu-Dong Yao

**Abstract:** We propose a novel defensive mechanism based on a generative adversarial network (GAN) framework to defend against adversarial attacks in end-to-end communications systems. Specifically, we utilize a generative network to model a powerful adversary and enable the end-to-end communications system to combat the generative attack network via a minimax game. We show that the proposed system not only works well against white-box and black-box adversarial attacks but also possesses excellent generalization capabilities to maintain good performance under no attacks. We also show that our GAN-based end-to-end system outperforms the conventional communications system and the end-to-end communications system with/without adversarial training.

Approximation Algorithms for Socially Fair Clustering arxiv:2103.02512 📈 4

Yury Makarychev, Ali Vakilian

**Abstract:** We present an $(e^{O(p)} \frac{\log \ell}{\log\log\ell})$-approximation algorithm for socially fair clustering with the $\ell_p$-objective. In this problem, we are given a set of points in a metric space. Each point belongs to one (or several) of $\ell$ groups. The goal is to find a $k$-medians, $k$-means, or, more generally, $\ell_p$-clustering that is simultaneously good for all of the groups. More precisely, we need to find a set of $k$ centers $C$ so as to minimize the maximum over all groups $j$ of $\sum_{u \text{ in group }j} d(u,C)^p$. The socially fair clustering problem was independently proposed by Ghadiri, Samadi, and Vempala [2021] and Abbasi, Bhaskara, and Venkatasubramanian [2021]. Our algorithm improves and generalizes their $O(\ell)$-approximation algorithms for the problem. The natural LP relaxation for the problem has an integrality gap of $Ω(\ell)$. In order to obtain our result, we introduce a strengthened LP relaxation and show that it has an integrality gap of $Θ(\frac{\log \ell}{\log\log\ell})$ for a fixed $p$. Additionally, we present a bicriteria approximation algorithm, which generalizes the bicriteria approximation of Abbasi et al. [2021].

DeepFN: Towards Generalizable Facial Action Unit Recognition with Deep Face Normalization arxiv:2103.02484 📈 4

Javier Hernandez, Daniel McDuff, Ognjen, Rudovic, Alberto Fung, Mary Czerwinski

**Abstract:** Facial action unit recognition has many applications from market research to psychotherapy and from image captioning to entertainment. Despite its recent progress, deployment of these models has been impeded due to their limited generalization to unseen people and demographics. This work conducts an in-depth analysis of performance across several dimensions: individuals(40 subjects), genders (male and female), skin types (darker and lighter), and databases (BP4D and DISFA). To help suppress the variance in data, we use the notion of self-supervised denoising autoencoders to design a method for deep face normalization(DeepFN) that transfers facial expressions of different people onto a common facial template which is then used to train and evaluate facial action recognition models. We show that person-independent models yield significantly lower performance (55% average F1 and accuracy across 40 subjects) than person-dependent models (60.3%), leading to a generalization gap of 5.3%. However, normalizing the data with the newly introduced DeepFN significantly increased the performance of person-independent models (59.6%), effectively reducing the gap. Similarly, we observed generalization gaps when considering gender (2.4%), skin type (5.3%), and dataset (9.4%), which were significantly reduced with the use of DeepFN. These findings represent an important step towards the creation of more generalizable facial action unit recognition systems.

Continuous Speech Separation with Ad Hoc Microphone Arrays arxiv:2103.02378 📈 4

Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng

**Abstract:** Speech separation has been shown effective for multi-talker speech recognition. Under the ad hoc microphone array setup where the array consists of spatially distributed asynchronous microphones, additional challenges must be overcome as the geometry and number of microphones are unknown beforehand. Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array. In this paper, we further extend this approach to continuous speech separation. Several techniques are introduced to enable speech separation for real continuous recordings. First, we apply a transformer-based network for spatio-temporal modeling of the ad hoc array signals. In addition, two methods are proposed to mitigate a speech duplication problem during single talker segments, which seems more severe in the ad hoc array scenarios. One method is device distortion simulation for reducing the acoustic mismatch between simulated training data and real recordings. The other is speaker counting to detect the single speaker segments and merge the output signal channels. Experimental results for AdHoc-LibiCSS, a new dataset consisting of continuous recordings of concatenated LibriSpeech utterances obtained by multiple different devices, show the proposed separation method can significantly improve the ASR accuracy for overlapped speech with little performance degradation for single talker segments.

Unsupervised Vehicle Re-Identification via Self-supervised Metric Learning using Feature Dictionary arxiv:2103.02250 📈 4

Jongmin Yu, Hyeontaek Oh

**Abstract:** The key challenge of unsupervised vehicle re-identification (Re-ID) is learning discriminative features from unlabelled vehicle images. Numerous methods using domain adaptation have achieved outstanding performance, but those methods still need a labelled dataset as a source domain. This paper addresses an unsupervised vehicle Re-ID method, which no need any types of a labelled dataset, through a Self-supervised Metric Learning (SSML) based on a feature dictionary. Our method initially extracts features from vehicle images and stores them in a dictionary. Thereafter, based on the dictionary, the proposed method conducts dictionary-based positive label mining (DPLM) to search for positive labels. Pair-wise similarity, relative-rank consistency, and adjacent feature distribution similarity are jointly considered to find images that may belong to the same vehicle of a given probe image. The results of DPLM are applied to dictionary-based triplet loss (DTL) to improve the discriminativeness of learnt features and to refine the quality of the results of DPLM progressively. The iterative process with DPLM and DTL boosts the performance of unsupervised vehicle Re-ID. Experimental results demonstrate the effectiveness of the proposed method by producing promising vehicle Re-ID performance without a pre-labelled dataset. The source code for this paper is publicly available on `https://github.com/andreYoo/VeRI_SSML_FD.git'.

Deep Neural Networks for the Assessment of Surgical Skills: A Systematic Review arxiv:2103.05113 📈 3

Erim Yanik, Xavier Intes, Uwe Kruger, Pingkun Yan, David Miller, Brian Van Voorst, Basiel Makled, Jack Norfleet, Suvranu De

**Abstract:** Surgical training in medical school residency programs has followed the apprenticeship model. The learning and assessment process is inherently subjective and time-consuming. Thus, there is a need for objective methods to assess surgical skills. Here, we use the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines to systematically survey the literature on the use of Deep Neural Networks for automated and objective surgical skill assessment, with a focus on kinematic data as putative markers of surgical competency. There is considerable recent interest in deep neural networks (DNN) due to the availability of powerful algorithms, multiple datasets, some of which are publicly available, as well as efficient computational hardware to train and host them. We have reviewed 530 papers, of which we selected 25 for this systematic review. Based on this review, we concluded that DNNs are powerful tools for automated, objective surgical skill assessment using both kinematic and video data. The field would benefit from large, publicly available, annotated datasets that are representative of the surgical trainee and expert demographics and multimodal data beyond kinematics and videos.

Structure-Preserving Progressive Low-rank Image Completion for Defending Adversarial Attacks arxiv:2103.02781 📈 3

Zhiqun Zhao, Hengyou Wang, Hao Sun, Zhihai He

**Abstract:** Deep neural networks recognize objects by analyzing local image details and summarizing their information along the inference layers to derive the final decision. Because of this, they are prone to adversarial attacks. Small sophisticated noise in the input images can accumulate along the network inference path and produce wrong decisions at the network output. On the other hand, human eyes recognize objects based on their global structure and semantic cues, instead of local image textures. Because of this, human eyes can still clearly recognize objects from images which have been heavily damaged by adversarial attacks. This leads to a very interesting approach for defending deep neural networks against adversarial attacks. In this work, we propose to develop a structure-preserving progressive low-rank image completion (SPLIC) method to remove unneeded texture details from the input images and shift the bias of deep neural networks towards global object structures and semantic cues. We formulate the problem into a low-rank matrix completion problem with progressively smoothed rank functions to avoid local minimums during the optimization process. Our experimental results demonstrate that the proposed method is able to successfully remove the insignificant local image details while preserving important global object structures. On black-box, gray-box, and white-box attacks, our method outperforms existing defense methods (by up to 12.6%) and significantly improves the adversarial robustness of the network.

Malware Classification with GMM-HMM Models arxiv:2103.02753 📈 3

Jing Zhao, Samanvitha Basole, Mark Stamp

**Abstract:** Discrete hidden Markov models (HMM) are often applied to malware detection and classification problems. However, the continuous analog of discrete HMMs, that is, Gaussian mixture model-HMMs (GMM-HMM), are rarely considered in the field of cybersecurity. In this paper, we use GMM-HMMs for malware classification and we compare our results to those obtained using discrete HMMs. As features, we consider opcode sequences and entropy-based sequences. For our opcode features, GMM-HMMs produce results that are comparable to those obtained using discrete HMMs, whereas for our entropy-based features, GMM-HMMs generally improve significantly on the classification results that we have achieved with discrete HMMs.

Contrastive learning of strong-mixing continuous-time stochastic processes arxiv:2103.02740 📈 3

Bingbin Liu, Pradeep Ravikumar, Andrej Risteski

**Abstract:** Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data. It has recently emerged as one of the leading learning paradigms in the absence of labels across many different domains (e.g. brain imaging, text, images). However, theoretical understanding of many aspects of training, both statistical and algorithmic, remain fairly elusive. In this work, we study the setting of time series -- more precisely, when we get data from a strong-mixing continuous-time stochastic process. We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case. Moreover, we give sample complexity bounds for solving this task and quantitatively characterize what the value of the contrastive loss implies for distributional closeness of the learned kernel. As a byproduct, we illuminate the appropriate settings for the contrastive distribution, as well as other hyperparameters in this setup.

IH-GAN: A Conditional Generative Model for Implicit Surface-Based Inverse Design of Cellular Structures arxiv:2103.02588 📈 3

Jun Wang, Wei, Chen, Daicong Da, Mark Fuge, Rahul Rai

**Abstract:** Variable-density cellular structures can overcome connectivity and manufacturability issues of topologically optimized structures, particularly those represented as discrete density maps. However, the optimization of such cellular structures is challenging due to the multiscale design problem. Past work addressing this problem generally either only optimizes the volume fraction of single-type unit cells but ignoring the effects of unit cell geometry on properties, or considers the geometry-property relation but builds this relation via heuristics. In contrast, we propose a simple yet more principled way to accurately model the property to geometry mapping using a conditional deep generative model, named Inverse Homogenization Generative Adversarial Network (IH-GAN). It learns the conditional distribution of unit cell geometries given properties and can realize the one-to-many mapping from geometry to properties. We further reduce the complexity of IH-GAN by using the implicit function parameterization to represent unit cell geometries. Results show that our method can 1) generate various unit cells that satisfy given material properties with high accuracy (relative error <5%) and 2) improve the optimized structural performance over the conventional topology-optimized variable-density structure. Specifically, in the minimum compliance example, our IH-GAN generated structure achieves an 84.4% reduction in concentrated stress and an extra 7% reduction in displacement. In the target deformation examples, our IH-GAN generated structure reduces the target matching error by 24.2% and 44.4% for two test cases, respectively. We also demonstrated that the connectivity issue for multi-type unit cells can be solved by transition layer blending.

Learning to Manipulate Amorphous Materials arxiv:2103.02533 📈 3

Yunbo Zhang, Wenhao Yu, C. Karen Liu, Charles C. Kemp, Greg Turk

**Abstract:** We present a method of training character manipulation of amorphous materials such as those often used in cooking. Common examples of amorphous materials include granular materials (salt, uncooked rice), fluids (honey), and visco-plastic materials (sticky rice, softened butter). A typical task is to spread a given material out across a flat surface using a tool such as a scraper or knife. We use reinforcement learning to train our controllers to manipulate materials in various ways. The training is performed in a physics simulator that uses position-based dynamics of particles to simulate the materials to be manipulated. The neural network control policy is given observations of the material (e.g. a low-resolution density map), and the policy outputs actions such as rotating and translating the knife. We demonstrate policies that have been successfully trained to carry out the following tasks: spreading, gathering, and flipping. We produce a final animation by using inverse kinematics to guide a character's arm and hand to match the motion of the manipulation tool such as a knife or a frying pan.

Root cause prediction based on bug reports arxiv:2103.02372 📈 3

Thomas Hirsch, Birgit Hofer

**Abstract:** This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process - either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create a benchmark consisting of 10459 reports. A subset was manually classified into three groups (semantic, memory, and concurrency) based on the bugs' root causes. Since the types of root cause are not equally distributed, a combination of keyword search and random selection was applied. Our data set for the machine learning approach consists of 369 bug reports (122 concurrency, 121 memory, and 126 semantic bugs). The bug reports are used as input to a natural language processing algorithm. We evaluated the performance of several classifiers for predicting the root causes for the given bug reports. Linear Support Vector machines achieved the highest mean precision (0.74) and recall (0.72) scores. The created bug data set and classification are publicly available.

A Hamiltonian Monte Carlo Model for Imputation and Augmentation of Healthcare Data arxiv:2103.02349 📈 3

Narges Pourshahrokhi, Samaneh Kouchaki, Kord M. Kober, Christine Miaskowski, Payam Barnaghi

**Abstract:** Missing values exist in nearly all clinical studies because data for a variable or question are not collected or not available. Inadequate handling of missing values can lead to biased results and loss of statistical power in analysis. Existing models usually do not consider privacy concerns or do not utilise the inherent correlations across multiple features to impute the missing values. In healthcare applications, we are usually confronted with high dimensional and sometimes small sample size datasets that need more effective augmentation or imputation techniques. Besides, imputation and augmentation processes are traditionally conducted individually. However, imputing missing values and augmenting data can significantly improve generalisation and avoid bias in machine learning models. A Bayesian approach to impute missing values and creating augmented samples in high dimensional healthcare data is proposed in this work. We propose folded Hamiltonian Monte Carlo (F-HMC) with Bayesian inference as a more practical approach to process the cross-dimensional relations by applying a random walk and Hamiltonian dynamics to adapt posterior distribution and generate large-scale samples. The proposed method is applied to a cancer symptom assessment dataset and confirmed to enrich the quality of data in precision, accuracy, recall, F1 score, and propensity metric.

LQResNet: A Deep Neural Network Architecture for Learning Dynamic Processes arxiv:2103.02249 📈 3

Pawan Goyal, Peter Benner

**Abstract:** Mathematical modeling is an essential step, for example, to analyze the transient behavior of a dynamical process and to perform engineering studies such as optimization and control. With the help of first-principles and expert knowledge, a dynamic model can be built, but for complex dynamic processes, appearing, e.g., in biology, chemical plants, neuroscience, financial markets, this often remains an onerous task. Hence, data-driven modeling of the dynamics process becomes an attractive choice and is supported by the rapid advancement in sensor and measurement technology. A data-driven approach, namely operator inference framework, models a dynamic process, where a particular structure of the nonlinear term is assumed. In this work, we suggest combining the operator inference with certain deep neural network approaches to infer the unknown nonlinear dynamics of the system. The approach uses recent advancements in deep learning and possible prior knowledge of the process if possible. We also briefly discuss several extensions and advantages of the proposed methodology. We demonstrate that the proposed methodology accomplishes the desired tasks for dynamics processes encountered in neural dynamics and the glycolytic oscillator.

RGB Matters: Learning 7-DoF Grasp Poses on Monocular RGBD Images arxiv:2103.02184 📈 3

Minghao Gou, Hao-Shu Fang, Zhanda Zhu, Sheng Xu, Chenxi Wang, Cewu Lu

**Abstract:** General object grasping is an important yet unsolved problem in the field of robotics. Most of the current methods either generate grasp poses with few DoF that fail to cover most of the success grasps, or only take the unstable depth image or point cloud as input which may lead to poor results in some cases. In this paper, we propose RGBD-Grasp, a pipeline that solves this problem by decoupling 7-DoF grasp detection into two sub-tasks where RGB and depth information are processed separately. In the first stage, an encoder-decoder like convolutional neural network Angle-View Net(AVN) is proposed to predict the SO(3) orientation of the gripper at every location of the image. Consequently, a Fast Analytic Searching(FAS) module calculates the opening width and the distance of the gripper to the grasp point. By decoupling the grasp detection problem and introducing the stable RGB modality, our pipeline alleviates the requirement for the high-quality depth image and is robust to depth sensor noise. We achieve state-of-the-art results on GraspNet-1Billion dataset compared with several baselines. Real robot experiments on a UR5 robot with an Intel Realsense camera and a Robotiq two-finger gripper show high success rates for both single object scenes and cluttered scenes. Our code and trained model will be made publicly available.

Helicopter Track Identification with Autoencoder arxiv:2103.04768 📈 2

Liya Wang, Panta Lucic, Keith Campbell, Craig Wanke

**Abstract:** Computing power, big data, and advancement of algorithms have led to a renewed interest in artificial intelligence (AI), especially in deep learning (DL). The success of DL largely lies on data representation because different representations can indicate to a degree the different explanatory factors of variation behind the data. In the last few year, the most successful story in DL is supervised learning. However, to apply supervised learning, one challenge is that data labels are expensive to get, noisy, or only partially available. With consideration that we human beings learn in an unsupervised way; self-supervised learning methods have garnered a lot of attention recently. A dominant force in self-supervised learning is the autoencoder, which has multiple uses (e.g., data representation, anomaly detection, denoise). This research explored the application of an autoencoder to learn effective data representation of helicopter flight track data, and then to support helicopter track identification. Our testing results are promising. For example, at Phoenix Deer Valley (DVT) airport, where 70% of recorded flight tracks have missing aircraft types, the autoencoder can help to identify twenty-two times more helicopters than otherwise detectable using rule-based methods; for Grand Canyon West Airport (1G4) airport, the autoencoder can identify thirteen times more helicopters than a current rule-based approach. Our approach can also identify mislabeled aircraft types in the flight track data and find true types for records with pseudo aircraft type labels such as HELO. With improved labelling, studies using these data sets can produce more reliable results.

Towards Deepening Graph Neural Networks: A GNTK-based Optimization Perspective arxiv:2103.03113 📈 2

Wei Huang, Yayong Li, Weitao Du, Richard Yi Da Xu, Jie Yin, Ling Chen, Miao Zhang

**Abstract:** Graph convolutional networks (GCNs) and their variants have achieved great success in dealing with graph-structured data. However, it is well known that deep GCNs suffer from the over-smoothing problem, where node representations tend to be indistinguishable as more layers are stacked up. The theoretical research to date on deep GCNs has focused primarily on expressive power rather than trainability, an optimization perspective. Compared to expressivity, trainability attempts to address a more fundamental question: given a sufficiently expressive space of models, can we successfully find a good solution by gradient descent-based optimizer? This work fills this gap by exploiting the Graph Neural Tangent Kernel (GNTK), which governs the optimization trajectory under gradient descent for wide GCNs. We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process. Additionally, we extend our theoretical framework to analyze residual connection-resemble techniques, which are found to be only able to mildly mitigate the exponential decay of trainability. To overcome the exponential decay problem more fundamentally, we propose Critical DropEdge, a connectivity-aware and graph-adaptive sampling method, inspired by our theoretical insights on trainability. Experimental evaluation consistently confirms using our proposed method can achieve better results compared to relevant counterparts with both infinite-width and finite-width.

STEP: Stochastic Traversability Evaluation and Planning for Risk-Aware Off-road Navigation arxiv:2103.02828 📈 2

David D. Fan, Kyohei Otsu, Yuki Kubo, Anushri Dixit, Joel Burdick, Ali-Akbar Agha-Mohammadi

**Abstract:** Although ground robotic autonomy has gained widespread usage in structured and controlled environments, autonomy in unknown and off-road terrain remains a difficult problem. Extreme, off-road, and unstructured environments such as undeveloped wilderness, caves, and rubble pose unique and challenging problems for autonomous navigation. To tackle these problems we propose an approach for assessing traversability and planning a safe, feasible, and fast trajectory in real-time. Our approach, which we name STEP (Stochastic Traversability Evaluation and Planning), relies on: 1) rapid uncertainty-aware mapping and traversability evaluation, 2) tail risk assessment using the Conditional Value-at-Risk (CVaR), and 3) efficient risk and constraint-aware kinodynamic motion planning using sequential quadratic programming-based (SQP) model predictive control (MPC). We analyze our method in simulation and validate its efficacy on wheeled and legged robotic platforms exploring extreme terrains including an abandoned subway and an underground lava tube.

Comparing the Value of Labeled and Unlabeled Data in Method-of-Moments Latent Variable Estimation arxiv:2103.02761 📈 2

Mayee F. Chen, Benjamin Cohen-Wang, Stephen Mussmann, Frederic Sala, Christopher Ré

**Abstract:** Labeling data for modern machine learning is expensive and time-consuming. Latent variable models can be used to infer labels from weaker, easier-to-acquire sources operating on unlabeled data. Such models can also be trained using labeled data, presenting a key question: should a user invest in few labeled or many unlabeled points? We answer this via a framework centered on model misspecification in method-of-moments latent variable estimation. Our core result is a bias-variance decomposition of the generalization error, which shows that the unlabeled-only approach incurs additional bias under misspecification. We then introduce a correction that provably removes this bias in certain cases. We apply our decomposition framework to three scenarios -- well-specified, misspecified, and corrected models -- to 1) choose between labeled and unlabeled data and 2) learn from their combination. We observe theoretically and with synthetic experiments that for well-specified models, labeled points are worth a constant factor more than unlabeled points. With misspecification, however, their relative value is higher due to the additional bias but can be reduced with correction. We also apply our approach to study real-world weak supervision techniques for dataset construction.

Worsening Perception: Real-time Degradation of Autonomous Vehicle Perception Performance for Simulation of Adverse Weather Conditions arxiv:2103.02760 📈 2

Ivan Fursa, Elias Fandi, Valentina Musat, Jacob Culley, Enric Gil, Izzeddin Teeti, Louise Bilous, Isaac Vander Sluis, Alexander Rast, Andrew Bradley

**Abstract:** Autonomous vehicles rely heavily upon their perception subsystems to see the environment in which they operate. Unfortunately, the effect of variable weather conditions presents a significant challenge to object detection algorithms, and thus it is imperative to test the vehicle extensively in all conditions which it may experience. However, development of robust autonomous vehicle subsystems requires repeatable, controlled testing - while real weather is unpredictable and cannot be scheduled. Real-world testing in adverse conditions is an expensive and time-consuming task, often requiring access to specialist facilities. Simulation is commonly relied upon as a substitute, with increasingly visually realistic representations of the real-world being developed. In the context of the complete autonomous vehicle control pipeline, subsystems downstream of perception need to be tested with accurate recreations of the perception system output, rather than focusing on subjective visual realism of the input - whether in simulation or the real world. This study develops the untapped potential of a lightweight weather augmentation method in an autonomous racing vehicle - focusing not on visual accuracy, but rather the effect upon perception subsystem performance in real time. With minimal adjustment, the prototype developed in this study can replicate the effects of water droplets on the camera lens, and fading light conditions. This approach introduces a latency of less than 8 ms using compute hardware well suited to being carried in the vehicle - rendering it ideal for real-time implementation that can be run during experiments in simulation, and augmented reality testing in the real world.

Malware Classification with Word Embedding Features arxiv:2103.02711 📈 2

Aparna Sunil Kale, Fabio Di Troia, Mark Stamp

**Abstract:** Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on features such as opcode sequences, API calls, and byte $n$-grams, among many others. In this research, we consider opcode features. We implement hybrid machine learning techniques, where we engineer feature vectors by training hidden Markov models -- a technique that we refer to as HMM2Vec -- and Word2Vec embeddings on these opcode sequences. The resulting HMM2Vec and Word2Vec embedding vectors are then used as features for classification algorithms. Specifically, we consider support vector machine (SVM), $k$-nearest neighbor ($k$-NN), random forest (RF), and convolutional neural network (CNN) classifiers. We conduct substantial experiments over a variety of malware families. Our experiments extend well beyond any previous work in this field.

Natural Language Understanding for Argumentative Dialogue Systems in the Opinion Building Domain arxiv:2103.02691 📈 2

Waheed Ahmed Abro, Annalena Aicher, Niklas Rach, Stefan Ultes, Wolfgang Minker, Guilin Qi

**Abstract:** This paper introduces a natural language understanding (NLU) framework for argumentative dialogue systems in the information-seeking and opinion building domain. Our approach distinguishes multiple user intents and identifies system arguments the user refers to in his or her natural language utterances. Our model is applicable in an argumentative dialogue system that allows the user to inform him-/herself about and build his/her opinion towards a controversial topic. In order to evaluate the proposed approach, we collect user utterances for the interaction with the respective system and labeled with intent and reference argument in an extensive online study. The data collection includes multiple topics and two different user types (native speakers from the UK and non-native speakers from China). The evaluation indicates a clear advantage of the utilized techniques over baseline approaches, as well as a robustness of the proposed approach against new topics and different language proficiency as well as cultural background of the user.

A Novel Context-Aware Multimodal Framework for Persian Sentiment Analysis arxiv:2103.02636 📈 2

Kia Dashtipour, Mandar Gogate, Erik Cambria, Amir Hussain

**Abstract:** Most recent works on sentiment analysis have exploited the text modality. However, millions of hours of video recordings posted on social media platforms everyday hold vital unstructured information that can be exploited to more effectively gauge public perception. Multimodal sentiment analysis offers an innovative solution to computationally understand and harvest sentiments from videos by contextually exploiting audio, visual and textual cues. In this paper, we, firstly, present a first of its kind Persian multimodal dataset comprising more than 800 utterances, as a benchmark resource for researchers to evaluate multimodal sentiment analysis approaches in Persian language. Secondly, we present a novel context-aware multimodal sentiment analysis framework, that simultaneously exploits acoustic, visual and textual cues to more accurately determine the expressed sentiment. We employ both decision-level (late) and feature-level (early) fusion methods to integrate affective cross-modal information. Experimental results demonstrate that the contextual integration of multimodal features such as textual, acoustic and visual features deliver better performance (91.39%) compared to unimodal features (89.24%).

ICAM-reg: Interpretable Classification and Regression with Feature Attribution for Mapping Neurological Phenotypes in Individual Scans arxiv:2103.02561 📈 2

Cher Bass, Mariana da Silva, Carole Sudre, Logan Z. J. Williams, Petru-Daniel Tudosiu, Fidel Alfaro-Almagro, Sean P. Fitzgibbon, Matthew F. Glasser, Stephen M. Smith, Emma C. Robinson

**Abstract:** An important goal of medical imaging is to be able to precisely detect patterns of disease specific to individual scans; however, this is challenged in brain imaging by the degree of heterogeneity of shape and appearance. Traditional methods, based on image registration to a global template, historically fail to detect variable features of disease, as they utilise population-based analyses, suited primarily to studying group-average effects. In this paper we therefore take advantage of recent developments in generative deep learning to develop a method for simultaneous classification, or regression, and feature attribution (FA). Specifically, we explore the use of a VAE-GAN translation network called ICAM, to explicitly disentangle class relevant features from background confounds for improved interpretability and regression of neurological phenotypes. We validate our method on the tasks of Mini-Mental State Examination (MMSE) cognitive test score prediction for the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort, as well as brain age prediction, for both neurodevelopment and neurodegeneration, using the developing Human Connectome Project (dHCP) and UK Biobank datasets. We show that the generated FA maps can be used to explain outlier predictions and demonstrate that the inclusion of a regression module improves the disentanglement of the latent space. Our code is freely available on Github https://github.com/CherBass/ICAM.

Enabling Visual Action Planning for Object Manipulation through Latent Space Roadmap arxiv:2103.02554 📈 2

Martina Lippi, Petra Poklukar, Michael C. Welle, Anastasiia Varava, Hang Yin, Alessandro Marino, Danica Kragic

**Abstract:** We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces, focusing on manipulation of deformable objects. We propose a Latent Space Roadmap (LSR) for task planning which is a graph-based structure globally capturing the system dynamics in a low-dimensional latent space. Our framework consists of three parts: (1) a Mapping Module (MM) that maps observations given in the form of images into a structured latent space extracting the respective states as well as generates observations from the latent states, (2) the LSR which builds and connects clusters containing similar states in order to find the latent plans between start and goal states extracted by MM, and (3) the Action Proposal Module that complements the latent plan found by the LSR with the corresponding actions. We present a thorough investigation of our framework on simulated box stacking and rope/box manipulation tasks, and a folding task executed on a real robot.

Multi-task Learning by Leveraging the Semantic Information arxiv:2103.02546 📈 2

Fan Zhou, Brahim Chaib-draa, Boyu Wang

**Abstract:** One crucial objective of multi-task learning is to align distributions across tasks so that the information between them can be transferred and shared. However, existing approaches only focused on matching the marginal feature distribution while ignoring the semantic information, which may hinder the learning performance. To address this issue, we propose to leverage the label information in multi-task learning by exploring the semantic conditional relations among tasks. We first theoretically analyze the generalization bound of multi-task learning based on the notion of Jensen-Shannon divergence, which provides new insights into the value of label information in multi-task learning. Our analysis also leads to a concrete algorithm that jointly matches the semantic distribution and controls label distribution divergence. To confirm the effectiveness of the proposed method, we first compare the algorithm with several baselines on some benchmarks and then test the algorithms under label space shift conditions. Empirical results demonstrate that the proposed method could outperform most baselines and achieve state-of-the-art performance, particularly showing the benefits under the label shift conditions.

Reservoir Computing with Superconducting Electronics arxiv:2103.02522 📈 2

Graham E. Rowlands, Minh-Hai Nguyen, Guilhem J. Ribeill, Andrew P. Wagner, Luke C. G. Govia, Wendson A. S. Barbosa, Daniel J. Gauthier, Thomas A. Ohki

**Abstract:** The rapidity and low power consumption of superconducting electronics makes them an ideal substrate for physical reservoir computing, which commandeers the computational power inherent to the evolution of a dynamical system for the purposes of performing machine learning tasks. We focus on a subset of superconducting circuits that exhibit soliton-like dynamics in simple transmission line geometries. With numerical simulations we demonstrate the effectiveness of these circuits in performing higher-order parity calculations and channel equalization at rates approaching 100 Gb/s. The availability of a proven superconducting logic scheme considerably simplifies the path to a fully integrated reservoir computing platform and makes superconducting reservoirs an enticing substrate for high rate signal processing applications.

Arthroscopic Multi-Spectral Scene Segmentation Using Deep Learning arxiv:2103.02465 📈 2

Shahnewaz Ali, Dr. Yaqub Jonmohamadi, Yu Takeda, Jonathan Roberts, Ross Crawford, Cameron Brown, Dr. Ajay K. Pandey

**Abstract:** Knee arthroscopy is a minimally invasive surgical (MIS) procedure which is performed to treat knee-joint ailment. Lack of visual information of the surgical site obtained from miniaturized cameras make this surgical procedure more complex. Knee cavity is a very confined space; therefore, surgical scenes are captured at close proximity. Insignificant context of knee atlas often makes them unrecognizable as a consequence unintentional tissue damage often occurred and shows a long learning curve to train new surgeons. Automatic context awareness through labeling of the surgical site can be an alternative to mitigate these drawbacks. However, from the previous studies, it is confirmed that the surgical site exhibits several limitations, among others, lack of discriminative contextual information such as texture and features which drastically limits this vision task. Additionally, poor imaging conditions and lack of accurate ground-truth labels are also limiting the accuracy. To mitigate these limitations of knee arthroscopy, in this work we proposed a scene segmentation method that successfully segments multi structures.

Land Cover Mapping in Limited Labels Scenario: A Survey arxiv:2103.02429 📈 2

Rahul Ghosh, Xiaowei Jia, Vipin Kumar

**Abstract:** Land cover mapping is essential for monitoring global environmental change and managing natural resources. Unfortunately, traditional classification models are plagued by limited training data available in existing land cover products and data heterogeneity over space and time. In this survey, we provide a structured and comprehensive overview of challenges in land cover mapping and machine learning methods used to address these problems. We also discuss the gaps and opportunities that exist for advancing research in this promising direction.

Real-World Single Image Super-Resolution: A Brief Review arxiv:2103.02368 📈 2

Honggang Chen, Xiaohai He, Linbo Qing, Yuanyuan Wu, Chao Ren, Ce Zhu

**Abstract:** Single image super-resolution (SISR), which aims to reconstruct a high-resolution (HR) image from a low-resolution (LR) observation, has been an active research topic in the area of image processing in recent decades. Particularly, deep learning-based super-resolution (SR) approaches have drawn much attention and have greatly improved the reconstruction performance on synthetic data. Recent studies show that simulation results on synthetic data usually overestimate the capacity to super-resolve real-world images. In this context, more and more researchers devote themselves to develop SR approaches for realistic images. This article aims to make a comprehensive review on real-world single image super-resolution (RSISR). More specifically, this review covers the critical publically available datasets and assessment metrics for RSISR, and four major categories of RSISR methods, namely the degradation modeling-based RSISR, image pairs-based RSISR, domain translation-based RSISR, and self-learning-based RSISR. Comparisons are also made among representative RSISR methods on benchmark datasets, in terms of both reconstruction quality and computational efficiency. Besides, we discuss challenges and promising research topics on RSISR.

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates arxiv:2103.02351 📈 2

Sebastian U. Stich, Amirkeivan Mohtashami, Martin Jaggi

**Abstract:** It has been experimentally observed that the efficiency of distributed training with stochastic gradient (SGD) depends decisively on the batch size and -- in asynchronous implementations -- on the gradient staleness. Especially, it has been observed that the speedup saturates beyond a certain batch size and/or when the delays grow too large. We identify a data-dependent parameter that explains the speedup saturation in both these settings. Our comprehensive theoretical analysis, for strongly convex, convex and non-convex settings, unifies and generalized prior work directions that often focused on only one of these two aspects. In particular, our approach allows us to derive improved speedup results under frequently considered sparsity assumptions. Our insights give rise to theoretically based guidelines on how the learning rates can be adjusted in practice. We show that our results are tight and illustrate key findings in numerical experiments.

Meta-Learning with Variational Bayes arxiv:2103.02265 📈 2

Lucas D. Lingle

**Abstract:** The field of meta-learning seeks to improve the ability of today's machine learning systems to adapt efficiently to small amounts of data. Typically this is accomplished by training a system with a parametrized update rule to improve a task-relevant objective based on supervision or a reward function. However, in many domains of practical interest, task data is unlabeled, or reward functions are unavailable. In this paper we introduce a new approach to address the more general problem of generative meta-learning, which we argue is an important prerequisite for obtaining human-level cognitive flexibility in artificial agents, and can benefit many practical applications along the way. Our contribution leverages the AEVB framework and mean-field variational Bayes, and creates fast-adapting latent-space generative models. At the heart of our contribution is a new result, showing that for a broad class of deep generative latent variable models, the relevant VB updates do not depend on any generative neural network. The theoretical merits of our approach are reflected in empirical experiments.

Computation Resource Allocation Solution in Recommender Systems arxiv:2103.02259 📈 2

Xun Yang, Yunli Wang, Cheng Chen, Qing Tan, Chuan Yu, Jian Xu, Xiaoqiang Zhu

**Abstract:** Recommender systems rely heavily on increasing computation resources to improve their business goal. By deploying computation-intensive models and algorithms, these systems are able to inference user interests and exhibit certain ads or commodities from the candidate set to maximize their business goals. However, such systems are facing two challenges in achieving their goals. On the one hand, facing massive online requests, computation-intensive models and algorithms are pushing their computation resources to the limit. On the other hand, the response time of these systems is strictly limited to a short period, e.g. 300 milliseconds in our real system, which is also being exhausted by the increasingly complex models and algorithms. In this paper, we propose the computation resource allocation solution (CRAS) that maximizes the business goal with limited computation resources and response time. We comprehensively illustrate the problem and formulate such a problem as an optimization problem with multiple constraints, which could be broken down into independent sub-problems. To solve the sub-problems, we propose the revenue function to facilitate the theoretical analysis, and obtain the optimal computation resource allocation strategy. To address the applicability issues, we devise the feedback control system to help our strategy constantly adapt to the changing online environment. The effectiveness of our method is verified by extensive experiments based on the real dataset from Taobao.com. We also deploy our method in the display advertising system of Alibaba. The online results show that our computation resource allocation solution achieves significant business goal improvement without any increment of computation cost, which demonstrates the efficacy of our method in real industrial practice.

Automatically detecting the conflicts between software requirements based on finer semantic analysis arxiv:2103.02255 📈 2

Weize Guo, Li Zhang, Xiaoli Lian

**Abstract:** Context: Conflicts between software requirements bring uncertainties to product development. Some great approaches have been proposed to identify these conflicts. However, they usually require the software requirements represented with specific templates and/or depend on other external source which is often uneasy to build for lots of projects in practice. Objective: We aim to propose an approach Finer Semantic Analysis-based Requirements Conflict Detector (FSARC) to automatically detecting the conflicts between the given natural language functional requirements by analyzing their finer semantic compositions. Method: We build a harmonized semantic meta-model of functional requirements with the form of eight-tuple. Then we propose algorithms to automatically analyze the linguistic features of requirements and to annotate the semantic elements for their semantic model construction. And we define seven types of conflicts as long as their heuristic detecting rules on the ground of their text pattern and semantical dependency. Finally, we design and implement the algorithm for conflicts detection. Results: The experiment with four requirement datasets illustrates that the recall of FSARC is nearly 100% and the average precision is 83.88% on conflicts detection. Conclusion: We provide a useful tool for detecting the conflicts between natural language functional requirements to improve the quality of the final requirements set. Besides, our approach is capable of transforming the natural language functional requirements into eight semantic tuples, which is useful not only the detection of the conflicts between requirements but also some other tasks such as constructing the association between requirements and so on.

An Attention Based Neural Network for Code Switching Detection: English & Roman Urdu arxiv:2103.02252 📈 2

Aizaz Hussain, Muhammad Umair Arshad

**Abstract:** Code-switching is a common phenomenon among people with diverse lingual background and is widely used on the internet for communication purposes. In this paper, we present a Recurrent Neural Network combined with the Attention Model for Language Identification in Code-Switched Data in English and low resource Roman Urdu. The attention model enables the architecture to learn the important features of the languages hence classifying the code switched data. We demonstrated our approach by comparing the results with state of the art models i.e. Hidden Markov Models, Conditional Random Field and Bidirectional LSTM. The models evaluation, using confusion matrix metrics, showed that the attention mechanism provides improved the precision and accuracy as compared to the other models.

An Iterative Contextualization Algorithm with Second-Order Attention arxiv:2103.02190 📈 2

Diego Maupomé, Marie-Jean Meurs

**Abstract:** Combining the representations of the words that make up a sentence into a cohesive whole is difficult, since it needs to account for the order of words, and to establish how the words present relate to each other. The solution we propose consists in iteratively adjusting the context. Our algorithm starts with a presumably erroneous value of the context, and adjusts this value with respect to the tokens at hand. In order to achieve this, representations of words are built combining their symbolic embedding with a positional encoding into single vectors. The algorithm then iteratively weighs and aggregates these vectors using our novel second-order attention mechanism. Our models report strong results in several well-known text classification tasks.

Auditory Attention Decoding from EEG using Convolutional Recurrent Neural Network arxiv:2103.02183 📈 2

Zhen Fu, Bo Wang, Xihong Wu, Jing Chen

**Abstract:** The auditory attention decoding (AAD) approach was proposed to determine the identity of the attended talker in a multi-talker scenario by analyzing electroencephalography (EEG) data. Although the linear model-based method has been widely used in AAD, the linear assumption was considered oversimplified and the decoding accuracy remained lower for shorter decoding windows. Recently, nonlinear models based on deep neural networks (DNN) have been proposed to solve this problem. However, these models did not fully utilize both the spatial and temporal features of EEG, and the interpretability of DNN models was rarely investigated. In this paper, we proposed novel convolutional recurrent neural network (CRNN) based regression model and classification model, and compared them with both the linear model and the state-of-the-art DNN models. Results showed that, our proposed CRNN-based classification model outperformed others for shorter decoding windows (around 90% for 2 s and 5 s). Although worse than classification models, the decoding accuracy of the proposed CRNN-based regression model was about 5% greater than other regression models. The interpretability of DNN models was also investigated by visualizing layers' weight.

Hate, Obscenity, and Insults: Measuring the Exposure of Children to Inappropriate Comments in YouTube arxiv:2103.09050 📈 1

Sultan Alshamrani, Ahmed Abusnaina, Mohammed Abuhamad, Daehun Nyang, David Mohaisen

**Abstract:** Social media has become an essential part of the daily routines of children and adolescents. Moreover, enormous efforts have been made to ensure the psychological and emotional well-being of young users as well as their safety when interacting with various social media platforms. In this paper, we investigate the exposure of those users to inappropriate comments posted on YouTube videos targeting this demographic. We collected a large-scale dataset of approximately four million records and studied the presence of five age-inappropriate categories and the amount of exposure to each category. Using natural language processing and machine learning techniques, we constructed ensemble classifiers that achieved high accuracy in detecting inappropriate comments. Our results show a large percentage of worrisome comments with inappropriate content: we found 11% of the comments on children's videos to be toxic, highlighting the importance of monitoring comments, particularly on children's platforms.

Performance and Complexity Analysis of bi-directional Recurrent Neural Network Models vs. Volterra Nonlinear Equalizers in Digital Coherent Systems arxiv:2103.03832 📈 1

Stavros Deligiannidis, Charis Mesaritakis, Adonis Bogris

**Abstract:** We investigate the complexity and performance of recurrent neural network (RNN) models as post-processing units for the compensation of fibre nonlinearities in digital coherent systems carrying polarization multiplexed 16-QAM and 32-QAM signals. We evaluate three bi-directional RNN models, namely the bi-LSTM, bi-GRU and bi-Vanilla-RNN and show that all of them are promising nonlinearity compensators especially in dispersion unmanaged systems. Our simulations show that during inference the three models provide similar compensation performance, therefore in real-life systems the simplest scheme based on Vanilla-RNN units should be preferred. We compare bi-Vanilla-RNN with Volterra nonlinear equalizers and exhibit its superiority both in terms of performance and complexity, thus highlighting that RNN processing is a very promising pathway for the upgrade of long-haul optical communication systems utilizing coherent detection.

A Multi-Modal Respiratory Disease Exacerbation Prediction Technique Based on a Spatio-Temporal Machine Learning Architecture arxiv:2103.03086 📈 1

Rohan Tan Bhowmik

**Abstract:** Chronic respiratory diseases, such as chronic obstructive pulmonary disease and asthma, are a serious health crisis, affecting a large number of people globally and inflicting major costs on the economy. Current methods for assessing the progression of respiratory symptoms are either subjective and inaccurate, or complex and cumbersome, and do not incorporate environmental factors. Lacking predictive assessments and early intervention, unexpected exacerbations can lead to hospitalizations and high medical costs. This work presents a multi-modal solution for predicting the exacerbation risks of respiratory diseases, such as COPD, based on a novel spatio-temporal machine learning architecture for real-time and accurate respiratory events detection, and tracking of local environmental and meteorological data and trends. The proposed new machine learning architecture blends key attributes of both convolutional and recurrent neural networks, allowing extraction of both spatial and temporal features encoded in respiratory sounds, thereby leading to accurate classification and tracking of symptoms. Combined with the data from environmental and meteorological sensors, and a predictive model based on retrospective medical studies, this solution can assess and provide early warnings of respiratory disease exacerbations. This research will improve the quality of patients' lives through early medical intervention, thereby reducing hospitalization rates and medical costs.

PET Image Reconstruction with Multiple Kernels and Multiple Kernel Space Regularizers arxiv:2103.02813 📈 1

Shiyao Guo, Yuxia Sheng, Shenpeng Li, Li Chai, Jingxin Zhang

**Abstract:** Kernelized maximum-likelihood (ML) expectation maximization (EM) methods have recently gained prominence in PET image reconstruction, outperforming many previous state-of-the-art methods. But they are not immune to the problems of non-kernelized MLEM methods in potentially large reconstruction error and high sensitivity to iteration number. This paper demonstrates these problems by theoretical reasoning and experiment results, and provides a novel solution to solve these problems. The solution is a regularized kernelized MLEM with multiple kernel matrices and multiple kernel space regularizers that can be tailored for different applications. To reduce the reconstruction error and the sensitivity to iteration number, we present a general class of multi-kernel matrices and two regularizers consisting of kernel image dictionary and kernel image Laplacian quatradic, and use them to derive the single-kernel regularized EM and multi-kernel regularized EM algorithms for PET image reconstruction. These new algorithms are derived using the technical tools of multi-kernel combination in machine learning, image dictionary learning in sparse coding, and graph Laplcian quadratic in graph signal processing. Extensive tests and comparisons on the simulated and in vivo data are presented to validate and evaluate the new algorithms, and demonstrate their superior performance and advantages over the kernelized MLEM and other conventional methods.

Contrast Adaptive Tissue Classification by Alternating Segmentation and Synthesis arxiv:2103.02767 📈 1

Dzung L. Pham, Yi-Yu Chou, Blake E. Dewey, Daniel S. Reich, John A. Butman, Snehashis Roy

**Abstract:** Deep learning approaches to the segmentation of magnetic resonance images have shown significant promise in automating the quantitative analysis of brain images. However, a continuing challenge has been its sensitivity to the variability of acquisition protocols. Attempting to segment images that have different contrast properties from those within the training data generally leads to significantly reduced performance. Furthermore, heterogeneous data sets cannot be easily evaluated because the quantitative variation due to acquisition differences often dwarfs the variation due to the biological differences that one seeks to measure. In this work, we describe an approach using alternating segmentation and synthesis steps that adapts the contrast properties of the training data to the input image. This allows input images that do not resemble the training data to be more consistently segmented. A notable advantage of this approach is that only a single example of the acquisition protocol is required to adapt to its contrast properties. We demonstrate the efficacy of our approaching using brain images from a set of human subjects scanned with two different T1-weighted volumetric protocols.

A toolbox for neuromorphic sensing in robotics arxiv:2103.02751 📈 1

Julien Dupeyroux, Stein Stroobants, Guido de Croon

**Abstract:** The third generation of artificial intelligence (AI) introduced by neuromorphic computing is revolutionizing the way robots and autonomous systems can sense the world, process the information, and interact with their environment. The promises of high flexibility, energy efficiency, and robustness of neuromorphic systems is widely supported by software tools for simulating spiking neural networks, and hardware integration (neuromorphic processors). Yet, while efforts have been made on neuromorphic vision (event-based cameras), it is worth noting that most of the sensors available for robotics remain inherently incompatible with neuromorphic computing, where information is encoded into spikes. To facilitate the use of traditional sensors, we need to convert the output signals into streams of spikes, i.e., a series of events (+1, -1) along with their corresponding timestamps. In this paper, we propose a review of the coding algorithms from a robotics perspective and further supported by a benchmark to assess their performance. We also introduce a ROS (Robot Operating System) toolbox to encode and decode input signals coming from any type of sensor available on a robot. This initiative is meant to stimulate and facilitate robotic integration of neuromorphic AI, with the opportunity to adapt traditional off-the-shelf sensors to spiking neural nets within one of the most powerful robotic tools, ROS.

Malware Classification Using Long Short-Term Memory Models arxiv:2103.02746 📈 1

Dennis Dang, Fabio Di Troia, Mark Stamp

**Abstract:** Signature and anomaly based techniques are the quintessential approaches to malware detection. However, these techniques have become increasingly ineffective as malware has become more sophisticated and complex. Researchers have therefore turned to deep learning to construct better performing model. In this paper, we create four different long-short term memory (LSTM) based models and train each to classify malware samples from 20 families. Our features consist of opcodes extracted from malware executables. We employ techniques used in natural language processing (NLP), including word embedding and bidirection LSTMs (biLSTM), and we also use convolutional neural networks (CNN). We find that a model consisting of word embedding, biLSTMs, and CNN layers performs best in our malware classification experiments.

Self-play Learning Strategies for Resource Assignment in Open-RAN Networks arxiv:2103.02649 📈 1

Xiaoyang Wang, Jonathan D Thomas, Robert J Piechocki, Shipra Kapoor, Raul Santos-Rodriguez, Arjun Parekh

**Abstract:** Open Radio Access Network (ORAN) is being developed with an aim to democratise access and lower the cost of future mobile data networks, supporting network services with various QoS requirements, such as massive IoT and URLLC. In ORAN, network functionality is dis-aggregated into remote units (RUs), distributed units (DUs) and central units (CUs), which allows flexible software on Commercial-Off-The-Shelf (COTS) deployments. Furthermore, the mapping of variable RU requirements to local mobile edge computing centres for future centralized processing would significantly reduce the power consumption in cellular networks. In this paper, we study the RU-DU resource assignment problem in an ORAN system, modelled as a 2D bin packing problem. A deep reinforcement learning-based self-play approach is proposed to achieve efficient RU-DU resource management, with AlphaGo Zero inspired neural Monte-Carlo Tree Search (MCTS). Experiments on representative 2D bin packing environment and real sites data show that the self-play learning strategy achieves intelligent RU-DU resource assignment for different network conditions.

On the geometric and Riemannian structure of the spaces of group equivariant non-expansive operators arxiv:2103.02543 📈 1

Pasquale Cascarano, Patrizio Frosini, Nicola Quercioli, Amir Saki

**Abstract:** Group equivariant non-expansive operators have been recently proposed as basic components in topological data analysis and deep learning. In this paper we study some geometric properties of the spaces of group equivariant operators and show how a space $\mathcal{F}$ of group equivariant non-expansive operators can be endowed with the structure of a Riemannian manifold, so making available the use of gradient descent methods for the minimization of cost functions on $\mathcal{F}$. As an application of this approach, we also describe a procedure to select a finite set of representative group equivariant non-expansive operators in the considered manifold.

Recurrent Graph Neural Network Algorithm for Unsupervised Network Community Detection arxiv:2103.02520 📈 1

Stanislav Sobolevsky

**Abstract:** Network community detection often relies on optimizing partition quality functions, like modularity. This optimization appears to be a complex problem traditionally relying on discrete heuristics. And although the problem could be reformulated as continuous optimization, direct application of the standard optimization methods has limited efficiency in overcoming the numerous local extrema. However, the rise of deep learning and its applications to graphs offers new opportunities. And while graph neural networks have been used for supervised and unsupervised learning on networks, their application to modularity optimization has not been explored yet. This paper proposes a new variant of the recurrent graph neural network algorithm for unsupervised network community detection through modularity optimization. The new algorithm's performance is compared against a popular and fast Louvain method and a more efficient but slower Combo algorithm recently proposed by the author. The approach also serves as a proof-of-concept for the broader application of recurrent graph neural networks to unsupervised network optimization.

A Fault Localization and Debugging Support Framework driven by Bug Tracking Data arxiv:2103.02386 📈 1

Thomas Hirsch

**Abstract:** Fault localization has been determined as a major resource factor in the software development life cycle. Academic fault localization techniques are mostly unknown and unused in professional environments. Although manual debugging approaches can vary significantly depending on bug type (e.g. memory bugs or semantic bugs), these differences are not reflected in most existing fault localization tools. Little research has gone into automated identification of bug types to optimize the fault localization process. Further, existing fault localization techniques leverage on historical data only for augmentation of suspiciousness rankings. This thesis aims to provide a fault localization framework by combining data from various sources to help developers in the fault localization process. To achieve this, a bug classification schema is introduced, benchmarks are created, and a novel fault localization method based on historical data is proposed.

Nonlinear MPC for Offset-Free Tracking of systems learned by GRU Neural Networks arxiv:2103.02383 📈 1

Fabio Bonassi, Caio Fabio Oliveira da Silva, Riccardo Scattolini

**Abstract:** The use of Recurrent Neural Networks (RNNs) for system identification has recently gathered increasing attention, thanks to their black-box modeling capabilities.Albeit RNNs have been fruitfully adopted in many applications, only few works are devoted to provide rigorous theoretical foundations that justify their use for control purposes. The aim of this paper is to describe how stable Gated Recurrent Units (GRUs), a particular RNN architecture, can be trained and employed in a Nonlinear MPC framework to perform offset-free tracking of constant references with guaranteed closed-loop stability. The proposed approach is tested on a pH neutralization process benchmark, showing remarkable performances.

Decision-makers Processing of AI Algorithmic Advice: Automation Bias versus Selective Adherence arxiv:2103.02381 📈 1

Saar Alon-Barkat, Madalina Busuioc

**Abstract:** Artificial intelligence algorithms are increasingly adopted as decisional aides by public organisations, with the promise of overcoming biases of human decision-makers. At the same time, the use of algorithms may introduce new biases in the human-algorithm interaction. A key concern emerging from psychology studies regards human overreliance on algorithmic advice even in the face of warning signals and contradictory information from other sources (automation bias). A second concern regards decision-makers inclination to selectively adopt algorithmic advice when it matches their pre-existing beliefs and stereotypes (selective adherence). To date, we lack rigorous empirical evidence about the prevalence of these biases in a public sector context. We assess these via two pre-registered experimental studies (N=1,509), simulating the use of algorithmic advice in decisions pertaining to the employment of school teachers in the Netherlands. In study 1, we test automation bias by exploring participants adherence to a prediction of teachers performance, which contradicts additional evidence, while comparing between two types of predictions: algorithmic v. human-expert. We do not find evidence for automation bias. In study 2, we replicate these findings, and we also test selective adherence by manipulating the teachers ethnic background. We find a propensity for adherence when the advice predicts low performance for a teacher of a negatively stereotyped ethnic minority, with no significant differences between algorithmic and human advice. Overall, our findings of selective, biased adherence belie the promise of neutrality that has propelled algorithm use in the public sector.

Temporal-Structure-Assisted Gradient Aggregation for Over-the-Air Federated Edge Learning arxiv:2103.02270 📈 1

Dian Fan, Xiaojun Yuan, Ying-Jun Angela Zhang

**Abstract:** In this paper, we investigate over-the-air model aggregation in a federated edge learning (FEEL) system. We introduce a Markovian probability model to characterize the intrinsic temporal structure of the model aggregation series. With this temporal probability model, we formulate the model aggregation problem as to infer the desired aggregated update given all the past observations from a Bayesian perspective. We develop a message passing based algorithm, termed temporal-structure-assisted gradient aggregation (TSA-GA), to fulfil this estimation task with low complexity and near-optimal performance. We further establish the state evolution (SE) analysis to characterize the behaviour of the proposed TSA-GA algorithm, and derive an explicit bound of the expected loss reduction of the FEEL system under certain standard regularity conditions. In addition, we develop an expectation maximization (EM) strategy to learn the unknown parameters in the Markovian model. We show that the proposed TSAGA algorithm significantly outperforms the state-of-the-art, and is able to achieve comparable learning performance as the error-free benchmark in terms of both convergence rate and final test accuracy.

Decoding Event-related Potential from Ear-EEG Signals based on Ensemble Convolutional Neural Networks in Ambulatory Environment arxiv:2103.02197 📈 1

Young-Eun Lee, Seong-Whan Lee

**Abstract:** Recently, practical brain-computer interface is actively carried out, especially, in an ambulatory environment. However, the electroencephalography (EEG) signals are distorted by movement artifacts and electromyography signals when users are moving, which make hard to recognize human intention. In addition, as hardware issues are also challenging, ear-EEG has been developed for practical brain-computer interface and has been widely used. In this paper, we proposed ensemble-based convolutional neural networks in ambulatory environment and analyzed the visual event-related potential responses in scalp- and ear-EEG in terms of statistical analysis and brain-computer interface performance. The brain-computer interface performance deteriorated as 3-14% when walking fast at 1.6 m/s. The proposed methods showed 0.728 in average of the area under the curve. The proposed method shows robust to the ambulatory environment and imbalanced data as well.

Eye-gaze Estimation with HEOG and Neck EMG using Deep Neural Networks arxiv:2103.02186 📈 1

Zhen Fu, Bo Wang, Fei Chen, Xihong Wu, Jing Chen

**Abstract:** Hearing-impaired listeners usually have troubles attending target talker in multi-talker scenes, even with hearing aids (HAs). The problem can be solved with eye-gaze steering HAs, which requires listeners eye-gazing on the target. In a situation where head rotates, eye-gaze is subject to both behaviors of saccade and head rotation. However, existing methods of eye-gaze estimation did not work reliably, since the listener's strategy of eye-gaze varies and measurements of the two behaviors were not properly combined. Besides, existing methods were based on hand-craft features, which could overlook some important information. In this paper, a head-fixed and a head-free experiments were conducted. We used horizontal electrooculography (HEOG) and neck electromyography (NEMG), which separately measured saccade and head rotation to commonly estimate eye-gaze. Besides traditional classifier and hand-craft features, deep neural networks (DNN) were introduced to automatically extract features from intact waveforms. Evaluation results showed that when the input was HEOG with inertial measurement unit, the best performance of our proposed DNN classifiers achieved 93.3%; and when HEOG was with NEMG together, the accuracy reached 72.6%, higher than that with HEOG (about 71.0%) or NEMG (about 35.7%) alone. These results indicated the feasibility to estimate eye-gaze with HEOG and NEMG.

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk arxiv:2103.02827 📈 0

Audrey Huang, Liu Leqi, Zachary C. Lipton, Kamyar Azizzadenesheli

**Abstract:** In order to model risk aversion in reinforcement learning, an emerging line of research adapts familiar algorithms to optimize coherent risk functionals, a class that includes conditional value-at-risk (CVaR). Because optimizing the coherent risk is difficult in Markov decision processes, recent work tends to focus on the Markov coherent risk (MCR), a time-consistent surrogate. While, policy gradient (PG) updates have been derived for this objective, it remains unclear (i) whether PG finds a global optimum for MCR; (ii) how to estimate the gradient in a tractable manner. In this paper, we demonstrate that, in general, MCR objectives (unlike the expected return) are not gradient dominated and that stationary points are not, in general, guaranteed to be globally optimal. Moreover, we present a tight upper bound on the suboptimality of the learned policy, characterizing its dependence on the nonlinearity of the objective and the degree of risk aversion. Addressing (ii), we propose a practical implementation of PG that uses state distribution reweighting to overcome previous limitations. Through experiments, we demonstrate that when the optimality gap is small, PG can learn risk-sensitive policies. However, we find that instances with large suboptimality gaps are abundant and easy to construct, outlining an important challenge for future research.

On the Importance of Sampling in Training GCNs: Tighter Analysis and Variance Reduction arxiv:2103.02696 📈 0

Weilin Cong, Morteza Ramezani, Mehrdad Mahdavi

**Abstract:** Graph Convolutional Networks (GCNs) have achieved impressive empirical advancement across a wide variety of semi-supervised node classification tasks. Despite their great success, training GCNs on large graphs suffers from computational and memory issues. A potential path to circumvent these obstacles is sampling-based methods, where at each layer a subset of nodes is sampled. Although recent studies have empirically demonstrated the effectiveness of sampling-based methods, these works lack theoretical convergence guarantees under realistic settings and cannot fully leverage the information of evolving parameters during optimization. In this paper, we describe and analyze a general doubly variance reduction schema that can accelerate any sampling method under the memory budget. The motivating impetus for the proposed schema is a careful analysis of the variance of sampling methods where it is shown that the induced variance can be decomposed into node embedding approximation variance (zeroth-order variance) during forward propagation and layerwise-gradient variance (first-order variance) during backward propagation. We theoretically analyze the convergence of the proposed schema and show that it enjoys an $\mathcal{O}(1/T)$ convergence rate. We complement our theoretical results by integrating the proposed schema in different sampling methods and applying them to different large real-world graphs.

GLAMOUR: Graph Learning over Macromolecule Representations arxiv:2103.02565 📈 0

Somesh Mohapatra, Joyce An, Rafael Gómez-Bombarelli

**Abstract:** The near-infinite chemical diversity of natural and artificial macromolecules arises from the vast range of possible component monomers, linkages, and polymers topologies. This enormous variety contributes to the ubiquity and indispensability of macromolecules but hinders the development of general machine learning methods with macromolecules as input. To address this, we developed GLAMOUR, a framework for chemistry-informed graph representation of macromolecules that enables quantifying structural similarity, and interpretable supervised learning for macromolecules.

Personal Productivity and Well-being -- Chapter 2 of the 2021 New Future of Work Report arxiv:2103.02524 📈 0

Jenna Butler, Mary Czerwinski, Shamsi Iqbal, Sonia Jaffe, Kate Nowak, Emily Peloquin, Longqi Yang

**Abstract:** We now turn to understanding the impact that COVID-19 had on the personal productivity and well-being of information workers as their work practices were impacted by remote work. This chapter overviews people's productivity, satisfaction, and work patterns, and shows that the challenges and benefits of remote work are closely linked. Looking forward, the infrastructure surrounding work will need to evolve to help people adapt to the challenges of remote and hybrid work.

Evaluating Robustness of Counterfactual Explanations arxiv:2103.02354 📈 0

André Artelt, Valerie Vaquet, Riza Velioglu, Fabian Hinder, Johannes Brinkrolf, Malte Schilling, Barbara Hammer

**Abstract:** Transparency is a fundamental requirement for decision making systems when these should be deployed in the real world. It is usually achieved by providing explanations of the system's behavior. A prominent and intuitive type of explanations are counterfactual explanations. Counterfactual explanations explain a behavior to the user by proposing actions -- as changes to the input -- that would cause a different (specified) behavior of the system. However, such explanation methods can be unstable with respect to small changes to the input -- i.e. even a small change in the input can lead to huge or arbitrary changes in the output and of the explanation. This could be problematic for counterfactual explanations, as two similar individuals might get very different explanations. Even worse, if the recommended actions differ considerably in their complexity, one would consider such unstable (counterfactual) explanations as individually unfair. In this work, we formally and empirically study the robustness of counterfactual explanations in general, as well as under different models and different kinds of perturbations. Furthermore, we propose that plausible counterfactual explanations can be used instead of closest counterfactual explanations to improve the robustness and consequently the individual fairness of counterfactual explanations.

EmoWrite: A Sentiment Analysis-Based Thought to Text Conversion arxiv:2103.02238 📈 0

A. Shahid, I. Raza, S. A. Hussain

**Abstract:** Brain Computer Interface (BCI) helps in processing and extraction of useful information from the acquired brain signals having applications in diverse fields such as military, medicine, neuroscience, and rehabilitation. BCI has been used to support paralytic patients having speech impediments with severe disabilities. To help paralytic patients communicate with ease, BCI based systems convert silent speech (thoughts) to text. However, these systems have an inconvenient graphical user interface, high latency, limited typing speed, and low accuracy rate. Apart from these limitations, the existing systems do not incorporate the inevitable factor of a patient's emotional states and sentiment analysis. The proposed system EmoWrite implements a dynamic keyboard with contextualized appearance of characters reducing the traversal time and improving the utilization of the screen space. The proposed system has been evaluated and compared with the existing systems for accuracy, convenience, sentimental analysis, and typing speed. This system results in 6.58 Words Per Minute (WPM) and 31.92 Characters Per Minute (CPM) with an accuracy of 90.36 percent. EmoWrite also gives remarkable results when it comes to the integration of emotional states. Its Information Transfer Rate (ITR) is also high as compared to other systems i.e., 87.55 bits per min with commands and 72.52 bits per min for letters. Furthermore, it provides easy to use interface with a latency of 2.685 sec.

Next Page