Summary for 2021-10-02, created on 2021-12-16

ProTo: Program-Guided Transformer for Program-Guided Tasks arxiv:2110.00804 📈 49

Zelin Zhao, Karan Samel, Binghong Chen, Le Song

**Abstract:** Programs, consisting of semantic and structural information, play an important role in the communication between humans and agents. Towards learning general program executors to unify perception, reasoning, and decision making, we formulate program-guided tasks which require learning to execute a given program on the observed task specification. Furthermore, we propose the Program-guided Transformer (ProTo), which integrates both semantic and structural guidance of a program by leveraging cross-attention and masked self-attention to pass messages between the specification and routines in the program. ProTo executes a program in a learned latent space and enjoys stronger representation ability than previous neural-symbolic approaches. We demonstrate that ProTo significantly outperforms the previous state-of-the-art methods on GQA visual reasoning and 2D Minecraft policy learning datasets. Additionally, ProTo demonstrates better generalization to unseen, complex, and human-written programs.

Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams arxiv:2110.00751 📈 22

Erdem Bıyık, Anusha Lalitha, Rajarshi Saha, Andrea Goldsmith, Dorsa Sadigh

**Abstract:** When humans collaborate with each other, they often make decisions by observing others and considering the consequences that their actions may have on the entire team, instead of greedily doing what is best for just themselves. We would like our AI agents to effectively collaborate in a similar way by capturing a model of their partners. In this work, we propose and analyze a decentralized Multi-Armed Bandit (MAB) problem with coupled rewards as an abstraction of more general multi-agent collaboration. We demonstrate that naïve extensions of single-agent optimal MAB algorithms fail when applied for decentralized bandit teams. Instead, we propose a Partner-Aware strategy for joint sequential decision-making that extends the well-known single-agent Upper Confidence Bound algorithm. We analytically show that our proposed strategy achieves logarithmic regret, and provide extensive experiments involving human-AI and human-robot collaboration to validate our theoretical findings. Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy.

Seeking Visual Discomfort: Curiosity-driven Representations for Reinforcement Learning arxiv:2110.00784 📈 6

Elie Aljalbout, Maximilian Ulmer, Rudolph Triebel

**Abstract:** Vision-based reinforcement learning (RL) is a promising approach to solve control tasks involving images as the main observation. State-of-the-art RL algorithms still struggle in terms of sample efficiency, especially when using image observations. This has led to increased attention on integrating state representation learning (SRL) techniques into the RL pipeline. Work in this field demonstrates a substantial improvement in sample efficiency among other benefits. However, to take full advantage of this paradigm, the quality of samples used for training plays a crucial role. More importantly, the diversity of these samples could affect the sample efficiency of vision-based RL, but also its generalization capability. In this work, we present an approach to improve sample diversity for state representation learning. Our method enhances the exploration capability of RL algorithms, by taking advantage of the SRL setup. Our experiments show that our proposed approach boosts the visitation of problematic states, improves the learned state representation, and outperforms the baselines for all tested environments. These results are most apparent for environments where the baseline methods struggle. Even in simple environments, our method stabilizes the training, reduces the reward variance, and promotes sample efficiency.

Learning Models as Functionals of Signed-Distance Fields for Manipulation Planning arxiv:2110.00792 📈 5

Danny Driess, Jung-Su Ha, Marc Toussaint, Russ Tedrake

**Abstract:** This work proposes an optimization-based manipulation planning framework where the objectives are learned functionals of signed-distance fields that represent objects in the scene. Most manipulation planning approaches rely on analytical models and carefully chosen abstractions/state-spaces to be effective. A central question is how models can be obtained from data that are not primarily accurate in their predictions, but, more importantly, enable efficient reasoning within a planning framework, while at the same time being closely coupled to perception spaces. We show that representing objects as signed-distance fields not only enables to learn and represent a variety of models with higher accuracy compared to point-cloud and occupancy measure representations, but also that SDF-based models are suitable for optimization-based planning. To demonstrate the versatility of our approach, we learn both kinematic and dynamic models to solve tasks that involve hanging mugs on hooks and pushing objects on a table. We can unify these quite different tasks within one framework, since SDFs are the common object representation. Video: https://youtu.be/ga8Wlkss7co

Optimizing Neural Network for Computer Vision task in Edge Device arxiv:2110.00791 📈 5

Ranjith M S, S Parameshwara, Pavan Yadav A, Shriganesh Hegde

**Abstract:** The field of computer vision has grown very rapidly in the past few years due to networks like convolution neural networks and their variants. The memory required to store the model and computational expense are very high for such a network limiting it to deploy on the edge device. Many times, applications rely on the cloud but that makes it hard for working in real-time due to round-trip delays. We overcome these problems by deploying the neural network on the edge device itself. The computational expense for edge devices is reduced by reducing the floating-point precision of the parameters in the model. After this the memory required for the model decreases and the speed of the computation increases where the performance of the model is least affected. This makes an edge device to predict from the neural network all by itself.

GROWN: GRow Only When Necessary for Continual Learning arxiv:2110.00908 📈 4

Li Yang, Sen Lin, Junshan Zhang, Deliang Fan

**Abstract:** Catastrophic forgetting is a notorious issue in deep learning, referring to the fact that Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks. To address this issue, continual learning has been developed to learn new tasks sequentially and perform knowledge transfer from the old tasks to the new ones without forgetting. While recent structure-based learning methods show the capability of alleviating the forgetting problem, these methods start from a redundant full-size network and require a complex learning process to gradually grow-and-prune or search the network structure for each task, which is inefficient. To address this problem and enable efficient network expansion for new tasks, we first develop a learnable sparse growth method eliminating the additional pruning/searching step in previous structure-based methods. Building on this learnable sparse growth method, we then propose GROWN, a novel end-to-end continual learning framework to dynamically grow the model only when necessary. Different from all previous structure-based methods, GROWN starts from a small seed network, instead of a full-sized one. We validate GROWN on multiple datasets against state-of-the-art methods, which shows superior performance in both accuracy and model size. For example, we achieve 1.0\% accuracy gain on average compared to the current SOTA results on CIFAR-100 Superclass 20 tasks setting.

Weakly Supervised Attention-based Models Using Activation Maps for Citrus Mite and Insect Pest Classification arxiv:2110.00881 📈 4

Edson Bollis, Helena Maia, Helio Pedrini, Sandra Avila

**Abstract:** Citrus juices and fruits are commodities with great economic potential in the international market, but productivity losses caused by mites and other pests are still far from being a good mark. Despite the integrated pest mechanical aspect, only a few works on automatic classification have handled images with orange mite characteristics, which means tiny and noisy regions of interest. On the computational side, attention-based models have gained prominence in deep learning research, and, along with weakly supervised learning algorithms, they have improved tasks performed with some label restrictions. In agronomic research of pests and diseases, these techniques can improve classification performance while pointing out the location of mites and insects without specific labels, reducing deep learning development costs related to generating bounding boxes. In this context, this work proposes an attention-based activation map approach developed to improve the classification of tiny regions called Two-Weighted Activation Mapping, which also produces locations using feature map scores learned from class labels. We apply our method in a two-stage network process called Attention-based Multiple Instance Learning Guided by Saliency Maps. We analyze the proposed approach in two challenging datasets, the Citrus Pest Benchmark, which was captured directly in the field using magnifying glasses, and the Insect Pest, a large pest image benchmark. In addition, we evaluate and compare our models with weakly supervised methods, such as Attention-based Deep MIL and WILDCAT. The results show that our classifier is superior to literature methods that use tiny regions in their classification tasks, surpassing them in all scenarios by at least 16 percentage points. Moreover, our approach infers bounding box locations for salient insects, even training without any location labels.

A Case Study to Reveal if an Area of Interest has a Trend in Ongoing Tweets Using Word and Sentence Embeddings arxiv:2110.00866 📈 4

İsmail Aslan, Yücel Topçu

**Abstract:** In the field of Natural Language Processing, information extraction from texts has been the objective of many researchers for years. Many different techniques have been applied in order to reveal the opinion that a tweet might have, thus understanding the sentiment of the small writing up to 280 characters. Other than figuring out the sentiment of a tweet, a study can also focus on finding the correlation of the tweets with a certain area of interest, which constitutes the purpose of this study. In order to reveal if an area of interest has a trend in ongoing tweets, we have proposed an easily applicable automated methodology in which the Daily Mean Similarity Scores that show the similarity between the daily tweet corpus and the target words representing our area of interest is calculated by using a naïve correlation-based technique without training any Machine Learning Model. The Daily Mean Similarity Scores have mainly based on cosine similarity and word/sentence embeddings computed by Multilanguage Universal Sentence Encoder and showed main opinion stream of the tweets with respect to a certain area of interest, which proves that an ongoing trend of a specific subject on Twitter can easily be captured in almost real time by using the proposed methodology in this study. We have also compared the effectiveness of using word versus sentence embeddings while applying our methodology and realized that both give almost the same results, whereas using word embeddings requires less computational time than sentence embeddings, thus being more effective. This paper will start with an introduction followed by the background information about the basics, then continue with the explanation of the proposed methodology and later on finish by interpreting the results and concluding the findings.

A Comparative Study of Sentiment Analysis Using NLP and Different Machine Learning Techniques on US Airline Twitter Data arxiv:2110.00859 📈 4

Md. Taufiqul Haque Khan Tusar, Md. Touhidul Islam

**Abstract:** Today's business ecosystem has become very competitive. Customer satisfaction has become a major focus for business growth. Business organizations are spending a lot of money and human resources on various strategies to understand and fulfill their customer's needs. But, because of defective manual analysis on multifarious needs of customers, many organizations are failing to achieve customer satisfaction. As a result, they are losing customer's loyalty and spending extra money on marketing. We can solve the problems by implementing Sentiment Analysis. It is a combined technique of Natural Language Processing (NLP) and Machine Learning (ML). Sentiment Analysis is broadly used to extract insights from wider public opinion behind certain topics, products, and services. We can do it from any online available data. In this paper, we have introduced two NLP techniques (Bag-of-Words and TF-IDF) and various ML classification algorithms (Support Vector Machine, Logistic Regression, Multinomial Naive Bayes, Random Forest) to find an effective approach for Sentiment Analysis on a large, imbalanced, and multi-classed dataset. Our best approaches provide 77% accuracy using Support Vector Machine and Logistic Regression with Bag-of-Words technique.

SurvTRACE: Transformers for Survival Analysis with Competing Events arxiv:2110.00855 📈 4

Zifeng Wang, Jimeng Sun

**Abstract:** In medicine, survival analysis studies the time duration to events of interest such as mortality. One major challenge is how to deal with multiple competing events (e.g., multiple disease diagnoses). In this work, we propose a transformer-based model that does not make the assumption for the underlying survival distribution and is capable of handling competing events, namely SurvTRACE. We account for the implicit \emph{confounders} in the observational setting in multi-events scenarios, which causes selection bias as the predicted survival probability is influenced by irrelevant factors. To sufficiently utilize the survival data to train transformers from scratch, multiple auxiliary tasks are designed for multi-task learning. The model hence learns a strong shared representation from all these tasks and in turn serves for better survival analysis. We further demonstrate how to inspect the covariate relevance and importance through interpretable attention mechanisms of SurvTRACE, which suffices to great potential in enhancing clinical trial design and new treatment development. Experiments on METABRIC, SUPPORT, and SEER data with 470k patients validate the all-around superiority of our method.

Explainable Event Recognition arxiv:2110.00755 📈 4

Imran Khan, Kashif Ahmad, Namra Gul, Talhat Khan, Nasir Ahmad, Ala Al-Fuqaha

**Abstract:** The literature shows outstanding capabilities for CNNs in event recognition in images. However, fewer attempts are made to analyze the potential causes behind the decisions of the models and exploring whether the predictions are based on event-salient objects or regions? To explore this important aspect of event recognition, in this work, we propose an explainable event recognition framework relying on Grad-CAM and an Xception architecture-based CNN model. Experiments are conducted on three large-scale datasets covering a diversified set of natural disasters, social, and sports events. Overall, the model showed outstanding generalization capabilities obtaining overall F1-scores of 0.91, 0.94, and 0.97 on natural disasters, social, and sports events, respectively. Moreover, for subjective analysis of activation maps generated through Grad-CAM for the predicted samples of the model, a crowdsourcing study is conducted to analyze whether the model's predictions are based on event-related objects/regions or not? The results of the study indicate that 78%, 84%, and 78% of the model decisions on natural disasters, sports, and social events datasets, respectively, are based onevent-related objects or regions.

End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression arxiv:2110.00745 📈 4

Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, Bin Ma

**Abstract:** Echo and noise suppression is an integral part of a full-duplex communication system. Many recent acoustic echo cancellation (AEC) systems rely on a separate adaptive filtering module for linear echo suppression and a neural module for residual echo suppression. However, not only do adaptive filtering modules require convergence and remain susceptible to changes in acoustic environments, but this two-stage framework also often introduces unnecessary delays to the AEC system when neural modules are already capable of both linear and nonlinear echo suppression. In this paper, we exploit the offset-compensating ability of complex time-frequency masks and propose an end-to-end complex-valued neural network architecture. The building block of the proposed model is a pseudocomplex extension based on the densely-connected multidilated DenseNet (D3Net) building block, resulting in a very small network of only 354K parameters. The architecture utilized the multi-resolution nature of the D3Net building blocks to eliminate the need for pooling, allowing the network to extract features using large receptive fields without any loss of output resolution. We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement. Evaluation on both synthetic and real test sets demonstrated promising results across multiple energy-based metrics and perceptual proxies.

FICGAN: Facial Identity Controllable GAN for De-identification arxiv:2110.00740 📈 4

Yonghyun Jeong, Jooyoung Choi, Sungwon Kim, Youngmin Ro, Tae-Hyun Oh, Doyeon Kim, Heonseok Ha, Sungroh Yoon

**Abstract:** In this work, we present Facial Identity Controllable GAN (FICGAN) for not only generating high-quality de-identified face images with ensured privacy protection, but also detailed controllability on attribute preservation for enhanced data utility. We tackle the less-explored yet desired functionality in face de-identification based on the two factors. First, we focus on the challenging issue to obtain a high level of privacy protection in the de-identification task while uncompromising the image quality. Second, we analyze the facial attributes related to identity and non-identity and explore the trade-off between the degree of face de-identification and preservation of the source attributes for enhanced data utility. Based on the analysis, we develop Facial Identity Controllable GAN (FICGAN), an autoencoder-based conditional generative model that learns to disentangle the identity attributes from non-identity attributes on a face image. By applying the manifold k-same algorithm to satisfy k-anonymity for strengthened security, our method achieves enhanced privacy protection in de-identified face images. Numerous experiments demonstrate that our model outperforms others in various scenarios of face de-identification.

Hierarchical Gaussian Process Models for Regression Discontinuity/Kink under Sharp and Fuzzy Designs arxiv:2110.00921 📈 3

Ximing Wu

**Abstract:** We propose nonparametric Bayesian estimators for causal inference exploiting Regression Discontinuity/Kink (RD/RK) under sharp and fuzzy designs. Our estimators are based on Gaussian Process (GP) regression and classification. The GP methods are powerful probabilistic modeling approaches that are advantageous in terms of derivative estimation and uncertainty qualification, facilitating RK estimation and inference of RD/RK models. These estimators are extended to hierarchical GP models with an intermediate Bayesian neural network layer and can be characterized as hybrid deep learning models. Monte Carlo simulations show that our estimators perform similarly and often better than competing estimators in terms of precision, coverage and interval length. The hierarchical GP models improve upon one-layer GP models substantially. An empirical application of the proposed estimators is provided.

Enhancing Model Robustness and Fairness with Causality: A Regularization Approach arxiv:2110.00911 📈 3

Zhao Wang, Kai Shu, Aron Culotta

**Abstract:** Recent work has raised concerns on the risk of spurious correlations and unintended biases in statistical machine learning models that threaten model robustness and fairness. In this paper, we propose a simple and intuitive regularization approach to integrate causal knowledge during model training and build a robust and fair model by emphasizing causal features and de-emphasizing spurious features. Specifically, we first manually identify causal and spurious features with principles inspired from the counterfactual framework of causal inference. Then, we propose a regularization approach to penalize causal and spurious features separately. By adjusting the strength of the penalty for each type of feature, we build a predictive model that relies more on causal features and less on non-causal features. We conduct experiments to evaluate model robustness and fairness on three datasets with multiple metrics. Empirical results show that the new models built with causal awareness significantly improve model robustness with respect to counterfactual texts and model fairness with respect to sensitive attributes.

Inference-InfoGAN: Inference Independence via Embedding Orthogonal Basis Expansion arxiv:2110.00788 📈 3

Hongxiang Jiang, Jihao Yin, Xiaoyan Luo, Fuxiang Wang

**Abstract:** Disentanglement learning aims to construct independent and interpretable latent variables in which generative models are a popular strategy. InfoGAN is a classic method via maximizing Mutual Information (MI) to obtain interpretable latent variables mapped to the target space. However, it did not emphasize independent characteristic. To explicitly infer latent variables with inter-independence, we propose a novel GAN-based disentanglement framework via embedding Orthogonal Basis Expansion (OBE) into InfoGAN network (Inference-InfoGAN) in an unsupervised way. Under the OBE module, one set of orthogonal basis can be adaptively found to expand arbitrary data with independence property. To ensure the target-wise interpretable representation, we add a consistence constraint between the expansion coefficients and latent variables on the base of MI maximization. Additionally, we design an alternating optimization step on the consistence constraint and orthogonal requirement updating, so that the training of Inference-InfoGAN can be more convenient. Finally, experiments validate that our proposed OBE module obtains adaptive orthogonal basis, which can express better independent characteristics than fixed basis expression of Discrete Cosine Transform (DCT). To depict the performance in downstream tasks, we compared with the state-of-the-art GAN-based and even VAE-based approaches on different datasets. Our Inference-InfoGAN achieves higher disentanglement score in terms of FactorVAE, Separated Attribute Predictability (SAP), Mutual Information Gap (MIG) and Variation Predictability (VP) metrics without model fine-tuning. All the experimental results illustrate that our method has inter-independence inference ability because of the OBE module, and provides a good trade-off between it and target-wise interpretability of latent variables via jointing the alternating optimization.

Evaluating Deep Learning Models and Adversarial Attacks on Accelerometer-Based Gesture Authentication arxiv:2110.14597 📈 2

Elliu Huang, Fabio Di Troia, Mark Stamp

**Abstract:** Gesture-based authentication has emerged as a non-intrusive, effective means of authenticating users on mobile devices. Typically, such authentication techniques have relied on classical machine learning techniques, but recently, deep learning techniques have been applied this problem. Although prior research has shown that deep learning models are vulnerable to adversarial attacks, relatively little research has been done in the adversarial domain for behavioral biometrics. In this research, we collect tri-axial accelerometer gesture data (TAGD) from 46 users and perform classification experiments with both classical machine learning and deep learning models. Specifically, we train and test support vector machines (SVM) and convolutional neural networks (CNN). We then consider a realistic adversarial attack, where we assume the attacker has access to real users' TAGD data, but not the authentication model. We use a deep convolutional generative adversarial network (DC-GAN) to create adversarial samples, and we show that our deep learning model is surprisingly robust to such an attack scenario.

Attention module improves both performance and interpretability of 4D fMRI decoding neural network arxiv:2110.00920 📈 2

Zhoufan Jiang, Yanming Wang, ChenWei Shi, Yueyang Wu, Rongjie Hu, Shishuo Chen, Sheng Hu, Xiaoxiao Wang, Bensheng Qiu

**Abstract:** Decoding brain cognitive states from neuroimaging signals is an important topic in neuroscience. In recent years, deep neural networks (DNNs) have been recruited for multiple brain state decoding and achieved good performance. However, the open question of how to interpret the DNN black box remains unanswered. Capitalizing on advances in machine learning, we integrated attention modules into brain decoders to facilitate an in-depth interpretation of DNN channels. A 4D convolution operation was also included to extract temporo-spatial interaction within the fMRI signal. The experiments showed that the proposed model obtains a very high accuracy (97.4%) and outperforms previous researches on the 7 different task benchmarks from the Human Connectome Project (HCP) dataset. The visualization analysis further illustrated the hierarchical emergence of task-specific masks with depth. Finally, the model was retrained to regress individual traits within the HCP and to classify viewing images from the BOLD5000 dataset, respectively. Transfer learning also achieves good performance. A further visualization analysis shows that, after transfer learning, low-level attention masks remained similar to the source domain, whereas high-level attention masks changed adaptively. In conclusion, the proposed 4D model with attention module performed well and facilitated interpretation of DNNs, which is helpful for subsequent research.

Online Incremental Non-Gaussian Inference for SLAM Using Normalizing Flows arxiv:2110.00876 📈 2

Qiangqiang Huang, Can Pu, Kasra Khosoussi, David M. Rosen, Dehann Fourie, Jonathan P. How, John J. Leonard

**Abstract:** This paper presents a novel non-Gaussian inference algorithm, Normalizing Flow iSAM (NF-iSAM), for solving SLAM problems with non-Gaussian factors and/or nonlinear measurement models. NF-iSAM exploits the expressive power of neural networks to model normalizing flows that can accurately approximate the joint posterior of highly nonlinear and non-Gaussian factor graphs. By leveraging the Bayes tree, NF-iSAM is able to exploit the sparsity structure of SLAM, thus enabling efficient incremental updates similar to iSAM2, although in the more challenging non-Gaussian setting. We demonstrate the performance of NF-iSAM and compare it against state-of-the-art algorithms such as iSAM2 (Gaussian) and mm-iSAM (non-Gaussian) in synthetic and real range-only SLAM datasets with data association ambiguity.

Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning arxiv:2110.00871 📈 2

Tong Zhang

**Abstract:** Thompson Sampling has been widely used for contextual bandit problems due to the flexibility of its modeling power. However, a general theory for this class of methods in the frequentist setting is still lacking. In this paper, we present a theoretical analysis of Thompson Sampling, with a focus on frequentist regret bounds. In this setting, we show that the standard Thompson Sampling is not aggressive enough in exploring new actions, leading to suboptimality in some pessimistic situations. A simple modification called Feel-Good Thompson Sampling, which favors high reward models more aggressively than the standard Thompson Sampling, is proposed to remedy this problem. We show that the theoretical framework can be used to derive Bayesian regret bounds for standard Thompson Sampling, and frequentist regret bounds for Feel-Good Thompson Sampling. It is shown that in both cases, we can reduce the bandit regret problem to online least squares regression estimation. For the frequentist analysis, the online least squares regression bound can be directly obtained using online aggregation techniques which have been well studied. The resulting bandit regret bound matches the minimax lower bound in the finite action case. Moreover, the analysis can be generalized to handle a class of linearly embeddable contextual bandit problems (which generalizes the popular linear contextual bandit model). The obtained result again matches the minimax lower bound. Finally we illustrate that the analysis can be extended to handle some MDP problems.

Learning Networked Linear Dynamical Systems under Non-white Excitation from a Single Trajectory arxiv:2110.00852 📈 2

Harish Doddi, Deepjyoti Deka, Saurav Talukdar, Murti Salapaka

**Abstract:** We consider a networked linear dynamical system with $p$ agents/nodes. We study the problem of learning the underlying graph of interactions/dependencies from observations of the nodal trajectories over a time-interval $T$. We present a regularized non-casual consistent estimator for this problem and analyze its sample complexity over two regimes: (a) where the interval $T$ consists of $n$ i.i.d. observation windows of length $T/n$ (restart and record), and (b) where $T$ is one continuous observation window (consecutive). Using the theory of $M$-estimators, we show that the estimator recovers the underlying interactions, in either regime, in a time-interval that is logarithmic in the system size $p$. To the best of our knowledge, this is the first work to analyze the sample complexity of learning linear dynamical systems driven by unobserved not-white wide-sense stationary (WSS) inputs.

Welsch Based Multiview Disparity Estimation arxiv:2110.00803 📈 2

James L. Gray, Aous T. Naman, David S. Taubman

**Abstract:** In this work, we explore disparity estimation from a high number of views. We experimentally identify occlusions as a key challenge for disparity estimation for applications with high numbers of views. In particular, occlusions can actually result in a degradation in accuracy as more views are added to a dataset. We propose the use of a Welsch loss function for the data term in a global variational framework for disparity estimation. We also propose a disciplined warping strategy and a progressive inclusion of views strategy that can reduce the need for coarse to fine strategies that discard high spatial frequency components from the early iterations. Experimental results demonstrate that the proposed approach produces superior and/or more robust estimates than other conventional variational approaches.

Random Subgraph Detection Using Queries arxiv:2110.00744 📈 2

Wasim Huleihel, Arya Mazumdar, Soumyabrata Pal

**Abstract:** The planted densest subgraph detection problem refers to the task of testing whether in a given (random) graph there is a subgraph that is unusually dense. Specifically, we observe an undirected and unweighted graph on $n$ nodes. Under the null hypothesis, the graph is a realization of an Erdős-Rényi graph with edge probability (or, density) $q$. Under the alternative, there is a subgraph on $k$ vertices with edge probability $p>q$. The statistical as well as the computational barriers of this problem are well-understood for a wide range of the edge parameters $p$ and $q$. In this paper, we consider a natural variant of the above problem, where one can only observe a small part of the graph using adaptive edge queries. For this model, we determine the number of queries necessary and sufficient for detecting the presence of the planted subgraph. Specifically, we show that any (possibly randomized) algorithm must make $\mathsf{Q} = Ω(\frac{n^2}{k^2χ^4(p||q)}\log^2n)$ adaptive queries (on expectation) to the adjacency matrix of the graph to detect the planted subgraph with probability more than $1/2$, where $χ^2(p||q)$ is the Chi-Square distance. On the other hand, we devise a quasi-polynomial-time algorithm that detects the planted subgraph with high probability by making $\mathsf{Q} = O(\frac{n^2}{k^2χ^4(p||q)}\log^2n)$ non-adaptive queries. We then propose a polynomial-time algorithm which is able to detect the planted subgraph using $\mathsf{Q} = O(\frac{n^3}{k^3χ^2(p||q)}\log^3 n)$ queries. We conjecture that in the leftover regime, where $\frac{n^2}{k^2}\ll\mathsf{Q}\ll \frac{n^3}{k^3}$, no polynomial-time algorithms exist. Our results resolve two questions posed in \cite{racz2020finding}, where the special case of adaptive detection and recovery of a planted clique was considered.

A Robust Alternative for Graph Convolutional Neural Networks via Graph Neighborhood Filters arxiv:2110.00844 📈 1

Victor M. Tenorio, Samuel Rey, Fernando Gama, Santiago Segarra, Antonio G. Marques

**Abstract:** Graph convolutional neural networks (GCNNs) are popular deep learning architectures that, upon replacing regular convolutions with graph filters (GFs), generalize CNNs to irregular domains. However, classical GFs are prone to numerical errors since they consist of high-order polynomials. This problem is aggravated when several filters are applied in cascade, limiting the practical depth of GCNNs. To tackle this issue, we present the neighborhood graph filters (NGFs), a family of GFs that replaces the powers of the graph shift operator with $k$-hop neighborhood adjacency matrices. NGFs help to alleviate the numerical issues of traditional GFs, allow for the design of deeper GCNNs, and enhance the robustness to errors in the topology of the graph. To illustrate the advantage over traditional GFs in practical applications, we use NGFs in the design of deep neighborhood GCNNs to solve graph signal denoising and node classification problems over both synthetic and real-world data.

SHARP: Shielding-Aware Robust Planning for Safe and Efficient Human-Robot Interaction arxiv:2110.00843 📈 1

Haimin Hu, Kensuke Nakamura, Jaime F. Fisac

**Abstract:** Jointly achieving safety and efficiency in human-robot interaction (HRI) settings is a challenging problem, as the robot's planning objectives may be at odds with the human's own intent and expectations. Recent approaches ensure safe robot operation in uncertain environments through a supervisory control scheme, sometimes called "shielding", which overrides the robot's nominal plan with a safety fallback strategy when a safety-critical event is imminent. These reactive "last-resort" strategies (typically in the form of aggressive emergency maneuvers) focus on preserving safety without efficiency considerations; when the nominal planner is unaware of possible safety overrides, shielding can be activated more frequently than necessary, leading to degraded performance. In this work, we propose a new shielding-based planning approach that allows the robot to plan efficiently by explicitly accounting for possible future shielding events. Leveraging recent work on Bayesian human motion prediction, the resulting robot policy proactively balances nominal performance with the risk of high-cost emergency maneuvers triggered by low-probability human behaviors. We formalize Shielding-Aware Robust Planning (SHARP) as a stochastic optimal control problem and propose a computationally efficient framework for finding tractable approximate solutions at runtime. Our method outperforms the shielding-agnostic motion planning baseline (equipped with the same human intent inference scheme) on simulated driving examples with human trajectories taken from the recently released Waymo Open Motion Dataset.

Induction, Popper, and machine learning arxiv:2110.00840 📈 1

Bruce Nielson, Daniel C. Elton

**Abstract:** Francis Bacon popularized the idea that science is based on a process of induction by which repeated observations are, in some unspecified way, generalized to theories based on the assumption that the future resembles the past. This idea was criticized by Hume and others as untenable leading to the famous problem of induction. It wasn't until the work of Karl Popper that this problem was solved, by demonstrating that induction is not the basis for science and that the development of scientific knowledge is instead based on the same principles as biological evolution. Today, machine learning is also taught as being rooted in induction from big data. Solomonoff induction implemented in an idealized Bayesian agent (Hutter's AIXI) is widely discussed and touted as a framework for understanding AI algorithms, even though real-world attempts to implement something like AIXI immediately encounter fatal problems. In this paper, we contrast frameworks based on induction with Donald T. Campbell's universal Darwinism. We show that most AI algorithms in use today can be understood as using an evolutionary trial and error process searching over a solution space. In this work we argue that a universal Darwinian framework provides a better foundation for understanding AI systems. Moreover, at a more meta level the process of development of all AI algorithms can be understood under the framework of universal Darwinism.

Generating User-Centred Explanations via Illocutionary Question Answering: From Philosophy to Interfaces arxiv:2110.00762 📈 1

Francesco Sovrano, Fabio Vitali

**Abstract:** We propose a new method for generating explanations with Artificial Intelligence (AI) and a tool to test its expressive power within a user interface. In order to bridge the gap between philosophy and human-computer interfaces, we show a new approach for the generation of interactive explanations based on a sophisticated pipeline of AI algorithms for structuring natural language documents into knowledge graphs, answering questions effectively and satisfactorily. With this work we aim to prove that the philosophical theory of explanations presented by Achinstein can be actually adapted for being implemented into a concrete software application, as an interactive and illocutionary process of answering questions. Specifically, our contribution is an approach to frame illocution in a computer-friendly way, to achieve user-centrality with statistical question answering. In fact, we frame illocution, in an explanatory process, as that mechanism responsible for anticipating the needs of the explainee in the form of unposed, implicit, archetypal questions, hence improving the user-centrality of the underlying explanatory process. More precisely, we hypothesise that given an arbitrary explanatory process, increasing its goal-orientedness and degree of illocution results in the generation of more usable (as per ISO 9241-210) explanations. We tested our hypotheses with a user-study involving more than 60 participants, on two XAI-based systems, one for credit approval (finance) and one for heart disease prediction (healthcare). The results showed that our proposed solution produced a statistically significant improvement (hence with a p-value lower than 0.05) on effectiveness. This, combined with a visible alignment between the increments in effectiveness and satisfaction, suggests that our understanding of illocution can be correct, giving evidence in favour of our theory.

Making Things Explainable vs Explaining: Requirements and Challenges under the GDPR arxiv:2110.00758 📈 1

Francesco Sovrano, Fabio Vitali, Monica Palmirani

**Abstract:** The European Union (EU) through the High-Level Expert Group on Artificial Intelligence (AI-HLEG) and the General Data Protection Regulation (GDPR) has recently posed an interesting challenge to the eXplainable AI (XAI) community, by demanding a more user-centred approach to explain Automated Decision-Making systems (ADMs). Looking at the relevant literature, XAI is currently focused on producing explainable software and explanations that generally follow an approach we could term One-Size-Fits-All, that is unable to meet a requirement of centring on user needs. One of the causes of this limit is the belief that making things explainable alone is enough to have pragmatic explanations. Thus, insisting on a clear separation between explainabilty (something that can be explained) and explanations, we point to explanatorY AI (YAI) as an alternative and more powerful approach to win the AI-HLEG challenge. YAI builds over XAI with the goal to collect and organize explainable information, articulating it into something we called user-centred explanatory discourses. Through the use of explanatory discourses/narratives we represent the problem of generating explanations for Automated Decision-Making systems (ADMs) into the identification of an appropriate path over an explanatory space, allowing explainees to interactively explore it and produce the explanation best suited to their needs.

Implementation of MPPT Technique of Solar Module with Supervised Machine Learning arxiv:2110.00728 📈 1

Ruhi Sharmin, Sayeed Shafayet Chowdhury, Farihal Abedin, Kazi Mujibur Rahman

**Abstract:** In this paper, we proposed a method using supervised ML in solar PV system for MPPT analysis. For this purpose, an overall schematic diagram of a PV system is designed and simulated to create a dataset in MATLAB/ Simulink. Thus, by analyzing the output characteristics of a solar cell, an improved MPPT algorithm on the basis of neural network (NN) method is put forward to track the maximum power point (MPP) of solar cell modules. To perform the task, Bayesian Regularization method was chosen as the training algorithm as it works best even for smaller data supporting the wide range of the train data set. The theoretical results show that the improved NN MPPT algorithm has higher efficiency compared with the Perturb and Observe method in the same environment, and the PV system can keep working at MPP without oscillation and probability of any kind of misjudgment. So it can not only reduce misjudgment, but also avoid power loss around the MPP. Moreover, we implemented the algorithm in a hardware set-up and verified the theoretical result comparing it with the empirical data.

Using Out-of-the-Box Frameworks for Contrastive Unpaired Image Translation for Vestibular Schwannoma and Cochlea Segmentation: An approach for the crossMoDA Challenge arxiv:2110.01607 📈 0

Jae Won Choi

**Abstract:** The purpose of this study is to apply and evaluate out-of-the-box deep learning frameworks for the crossMoDA challenge. We use the CUT model, a model for unpaired image-to-image translation based on patchwise contrastive learning and adversarial learning, for domain adaptation from contrast-enhanced T1 MR to high-resolution T2 MR. As data augmentation, we generate additional images with vestibular schwannomas with lower signal intensity. For the segmentation task, we use the nnU-Net framework. Our final submission achieved mean Dice scores of 0.8299 in the validation phase and 0.8253 in the test phase. Our method ranked 3rd in the crossMoDA challenge.

Disarranged Zone Learning (DZL): An unsupervised and dynamic automatic stenosis recognition methodology based on coronary angiography arxiv:2110.00896 📈 0

Yanan Dai, Pengxiong Zhu, Bangde Xue, Yun Ling, Xibao Shi, Liang Geng, Qi Zhang, Jun Liu

**Abstract:** We proposed a novel unsupervised methodology named Disarranged Zone Learning (DZL) to automatically recognize stenosis in coronary angiography. The methodology firstly disarranges the frames in a video, secondly it generates an effective zone and lastly trains an encoder-decoder GRU model to learn the capability to recover disarranged frames. The breakthrough of our study is to discover and validate the Sequence Intensity (Recover Difficulty) is a measure of Coronary Artery Stenosis Status. Hence, the prediction accuracy of DZL is used as an approximator of coronary stenosis indicator. DZL is an unsupervised methodology and no label engineering effort is needed, the sub GRU model in DZL works as a self-supervised approach. So DZL could theoretically utilize infinitely huge amounts of coronary angiographies to learn and improve performance without laborious data labeling. There is no data preprocessing precondition to run DZL as it dynamically utilizes the whole video, hence it is easy to be implemented and generalized to overcome the data heterogeneity of coronary angiography. The overall average precision score achieves 0.93, AUC achieves 0.8 for this pure methodology. The highest segmented average precision score is 0.98 and the highest segmented AUC is 0.87 for coronary occlusion indicator. Finally, we developed a software demo to implement DZL methodology.

Next Page