Prev: 2022.02.27 Next: 2022.03.01

Summary for 2022-02-28, created on 2022-03-10

Logical Fallacy Detection arxiv:2202.13758 📈 176

Zhijing Jin, Abhinav Lalwani, Tejas Vaidhya, Xiaoyu Shen, Yiwen Ding, Zhiheng Lyu, Mrinmaya Sachan, Rada Mihalcea, Bernhard Schölkopf

**Abstract:** Reasoning is central to human intelligence. However, fallacious arguments are common, and some exacerbate problems such as spreading misinformation about climate change. In this paper, we propose the task of logical fallacy detection, and provide a new dataset (Logic) of logical fallacies generally found in text, together with an additional challenge set for detecting logical fallacies in climate change claims (LogicClimate). Detecting logical fallacies is a hard problem as the model must understand the underlying logical structure of the argument. We find that existing pretrained large language models perform poorly on this task. In contrast, we show that a simple structure-aware classifier outperforms the best language model by 5.46% on Logic and 3.86% on LogicClimate. We encourage future work to explore this task as (a) it can serve as a new reasoning challenge for language models, and (b) it can have potential applications in tackling the spread of misinformation.

State-of-the-Art in the Architecture, Methods and Applications of StyleGAN arxiv:2202.14020 📈 107

Amit H. Bermano, Rinon Gal, Yuval Alaluf, Ron Mokady, Yotam Nitzan, Omer Tov, Or Patashnik, Daniel Cohen-Or

**Abstract:** Generative Adversarial Networks (GANs) have established themselves as a prevalent approach to image synthesis. Of these, StyleGAN offers a fascinating case study, owing to its remarkable visual quality and an ability to support a large array of downstream tasks. This state-of-the-art report covers the StyleGAN architecture, and the ways it has been employed since its conception, while also analyzing its severe limitations. It aims to be of use for both newcomers, who wish to get a grasp of the field, and for more experienced readers that might benefit from seeing current research trends and existing tools laid out. Among StyleGAN's most interesting aspects is its learned latent space. Despite being learned with no supervision, it is surprisingly well-behaved and remarkably disentangled. Combined with StyleGAN's visual quality, these properties gave rise to unparalleled editing capabilities. However, the control offered by StyleGAN is inherently limited to the generator's learned distribution, and can only be applied to images generated by StyleGAN itself. Seeking to bring StyleGAN's latent control to real-world scenarios, the study of GAN inversion and latent space embedding has quickly gained in popularity. Meanwhile, this same study has helped shed light on the inner workings and limitations of StyleGAN. We map out StyleGAN's impressive story through these investigations, and discuss the details that have made StyleGAN the go-to generator. We further elaborate on the visual priors StyleGAN constructs, and discuss their use in downstream discriminative tasks. Looking forward, we point out StyleGAN's limitations and speculate on current trends and promising directions for future research, such as task and target specific fine-tuning.

Understanding Contrastive Learning Requires Incorporating Inductive Biases arxiv:2202.14037 📈 87

Nikunj Saunshi, Jordan Ash, Surbhi Goel, Dipendra Misra, Cyril Zhang, Sanjeev Arora, Sham Kakade, Akshay Krishnamurthy

**Abstract:** Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs. Recent attempts to theoretically explain the success of contrastive learning on downstream classification tasks prove guarantees depending on properties of {\em augmentations} and the value of {\em contrastive loss} of representations. We demonstrate that such analyses, that ignore {\em inductive biases} of the function class and training algorithm, cannot adequately explain the success of contrastive learning, even {\em provably} leading to vacuous guarantees in some settings. Extensive experiments on image and text domains highlight the ubiquity of this problem -- different function classes and algorithms behave very differently on downstream tasks, despite having the same augmentations and contrastive losses. Theoretical analysis is presented for the class of linear representations, where incorporating inductive biases of the function class allows contrastive learning to work with less stringent conditions compared to prior analyses.

Bayesian Structure Learning with Generative Flow Networks arxiv:2202.13903 📈 72

Tristan Deleu, António Góis, Chris Emezue, Mansi Rankawat, Simon Lacoste-Julien, Stefan Bauer, Yoshua Bengio

**Abstract:** In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) structure of Bayesian networks, from data. Defining such a distribution is very challenging, due to the combinatorially large sample space, and approximations based on MCMC are often required. Recently, a novel class of probabilistic models, called Generative Flow Networks (GFlowNets), have been introduced as a general framework for generative modeling of discrete and composite objects, such as graphs. In this work, we propose to use a GFlowNet as an alternative to MCMC for approximating the posterior distribution over the structure of Bayesian networks, given a dataset of observations. Generating a sample DAG from this approximate distribution is viewed as a sequential decision problem, where the graph is constructed one edge at a time, based on learned transition probabilities. Through evaluation on both simulated and real data, we show that our approach, called DAG-GFlowNet, provides an accurate approximation of the posterior over DAGs, and it compares favorably against other methods based on MCMC or variational inference.

Robust Training under Label Noise by Over-parameterization arxiv:2202.14026 📈 45

Sheng Liu, Zhihui Zhu, Qing Qu, Chong You

**Abstract:** Recently, over-parameterized deep networks, with increasingly more network parameters than training samples, have dominated the performances of modern machine learning. However, when the training data is corrupted, it has been well-known that over-parameterized networks tend to overfit and do not generalize. In this work, we propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data. Specifically, we model the label noise via another sparse over-parameterization term, and exploit implicit algorithmic regularizations to recover and separate the underlying corruptions. Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets. Furthermore, our experimental results are corroborated by theory on simplified linear models, showing that exact separation between sparse noise and low-rank data can be achieved under incoherent conditions. The work opens many interesting directions for improving over-parameterized models by using sparse over-parameterization and implicit regularization.

OUR-GAN: One-shot Ultra-high-Resolution Generative Adversarial Networks arxiv:2202.13799 📈 45

Donghwee Yoon, Junseok Oh, Hayeong Choi, Minjae Yi, Injung Kim

**Abstract:** We propose OUR-GAN, the first one-shot ultra-high-resolution (UHR) image synthesis framework that generates non-repetitive images with 4K or higher resolution from a single training image. OUR-GAN generates a visually coherent image at low resolution and then gradually increases the resolution by super-resolution. Since OUR-GAN learns from a real UHR image, it can synthesize large-scale shapes with fine details while maintaining long-range coherence, which is difficult with conventional generative models that generate large images based on the patch distribution learned from relatively small images. OUR-GAN applies seamless subregion-wise super-resolution that synthesizes 4k or higher UHR images with limited memory, preventing discontinuity at the boundary. Additionally, OUR-GAN improves visual coherence maintaining diversity by adding vertical positional embeddings to the feature maps. In experiments on the ST4K and RAISE datasets, OUR-GAN exhibited improved fidelity, visual coherency, and diversity compared with existing methods. The synthesized images are presented at https://anonymous-62348.github.io.

Resolving label uncertainty with implicit posterior models arxiv:2202.14000 📈 44

Esther Rolf, Nikolay Malkin, Alexandros Graikos, Ana Jojic, Caleb Robinson, Nebojsa Jojic

**Abstract:** We propose a method for jointly inferring labels across a collection of data samples, where each sample consists of an observation and a prior belief about the label. By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs. This formulation unifies various machine learning settings; the weak beliefs can come in the form of noisy or incomplete labels, likelihoods given by a different prediction mechanism on auxiliary input, or common-sense priors reflecting knowledge about the structure of the problem at hand. We demonstrate the proposed algorithms on diverse problems: classification with negative training examples, learning from rankings, weakly and self-supervised aerial imagery segmentation, co-segmentation of video frames, and coarsely supervised text classification.

Evaluating the Adversarial Robustness of Adaptive Test-time Defenses arxiv:2202.13711 📈 22

Francesco Croce, Sven Gowal, Thomas Brunner, Evan Shelhamer, Matthias Hein, Taylan Cemgil

**Abstract:** Adaptive defenses that use test-time optimization promise to improve robustness to adversarial examples. We categorize such adaptive test-time defenses and explain their potential benefits and drawbacks. In the process, we evaluate some of the latest proposed adaptive defenses (most of them published at peer-reviewed conferences). Unfortunately, none significantly improve upon static models when evaluated appropriately. Some even weaken the underlying static model while simultaneously increasing inference cost. While these results are disappointing, we still believe that adaptive test-time defenses are a promising avenue of research and, as such, we provide recommendations on evaluating such defenses. We go beyond the checklist provided by Carlini et al. (2019) by providing concrete steps that are specific to this type of defense.

On the Benefits of Large Learning Rates for Kernel Methods arxiv:2202.13733 📈 15

Gaspard Beugnot, Julien Mairal, Alessandro Rudi

**Abstract:** This paper studies an intriguing phenomenon related to the good generalization performance of estimators obtained by using large learning rates within gradient descent algorithms. First observed in the deep learning literature, we show that a phenomenon can be precisely characterized in the context of kernel methods, even though the resulting optimization problem is convex. Specifically, we consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution on the Hessian's eigenvectors. This extends an intuition described by Nakkiran (2020) on a two-dimensional toy problem to realistic learning scenarios such as kernel ridge regression. While large learning rates may be proven beneficial as soon as there is a mismatch between the train and test objectives, we further explain why it already occurs in classification tasks without assuming any particular mismatch between train and test data distributions.

LISA: Learning Interpretable Skill Abstractions from Language arxiv:2203.00054 📈 12

Divyansh Garg, Skanda Vaidyanath, Kuno Kim, Jiaming Song, Stefano Ermon

**Abstract:** Learning policies that effectually utilize language instructions in complex, multi-task environments is an important problem in imitation learning. While it is possible to condition on the entire language instruction directly, such an approach could suffer from generalization issues. To encode complex instructions into skills that can generalize to unseen instructions, we propose Learning Interpretable Skill Abstractions (LISA), a hierarchical imitation learning framework that can learn diverse, interpretable skills from language-conditioned demonstrations. LISA uses vector quantization to learn discrete skill codes that are highly correlated with language instructions and the behavior of the learned policy. In navigation and robotic manipulation environments, LISA is able to outperform a strong non-hierarchical baseline in the low data regime and compose learned skills to solve tasks containing unseen long-range instructions. Our method demonstrates a more natural way to condition on language in sequential decision-making problems and achieve interpretable and controllable behavior with the learned skills.

Differentiable Matrix Elements with MadJax arxiv:2203.00057 📈 11

Lukas Heinrich, Michael Kagan

**Abstract:** MadJax is a tool for generating and evaluating differentiable matrix elements of high energy scattering processes. As such, it is a step towards a differentiable programming paradigm in high energy physics that facilitates the incorporation of high energy physics domain knowledge, encoded in simulation software, into gradient based learning and optimization pipelines. MadJax comprises two components: (a) a plugin to the general purpose matrix element generator MadGraph that integrates matrix element and phase space sampling code with the JAX differentiable programming framework, and (b) a standalone wrapping API for accessing the matrix element code and its gradients, which are computed with automatic differentiation. The MadJax implementation and example applications of simulation based inference and normalizing flow based matrix element modeling, with capabilities enabled uniquely with differentiable matrix elements, are presented.

Structure Extraction in Task-Oriented Dialogues with Slot Clustering arxiv:2203.00073 📈 10

Liang Qiu, Chien-Sheng Wu, Wenhao Liu, Caiming Xiong

**Abstract:** Extracting structure information from dialogue data can help us better understand user and system behaviors. In task-oriented dialogues, dialogue structure has often been considered as transition graphs among dialogue states. However, annotating dialogue states manually is expensive and time-consuming. In this paper, we propose a simple yet effective approach for structure extraction in task-oriented dialogues. We first detect and cluster possible slot tokens with a pre-trained model to approximate dialogue ontology for a target domain. Then we track the status of each identified token group and derive a state transition structure. Empirical results show that our approach outperforms unsupervised baseline models by far in dialogue structure extraction. In addition, we show that data augmentation based on extracted structures enriches the surface formats of training data and can achieve a significant performance boost in dialogue response generation.

How and what to learn:The modes of machine learning arxiv:2202.13829 📈 8

Sihan Feng, Yong Zhang, Fuming Wang, Hong Zhao

**Abstract:** We proposal a new approach, namely the weight pathway analysis (WPA), to study the mechanism of multilayer neural networks. The weight pathways linking neurons longitudinally from input neurons to output neurons are considered as the basic units of a neural network. We decompose a neural network into a series of subnetworks of weight pathways, and establish characteristic maps for these subnetworks. The parameters of a characteristic map can be visualized, providing a longitudinal perspective of the network and making the neural network explainable. Using WPA, we discover that a neural network stores and utilizes information in a "holographic" way, that is, the network encodes all training samples in a coherent structure. An input vector interacts with this "holographic" structure to enhance or suppress each subnetwork which working together to produce the correct activities in the output neurons to recognize the input sample. Furthermore, with WPA, we reveal fundamental learning modes of a neural network: the linear learning mode and the nonlinear learning mode. The former extracts linearly separable features while the latter extracts linearly inseparable features. It is found that hidden-layer neurons self-organize into different classes in the later stages of the learning process. It is further discovered that the key strategy to improve the performance of a neural network is to control the ratio of the two learning modes to match that of the linear and the nonlinear features, and that increasing the width or the depth of a neural network helps this ratio controlling process. This provides theoretical ground for the practice of optimizing a neural network via increasing its width or its depth. The knowledge gained with WPA enables us to understand the fundamental questions such as what to learn, how to learn, and how can learn well.

Rule-based Evolutionary Bayesian Learning arxiv:2202.13778 📈 8

Themistoklis Botsas, Lachlan R. Mason, Omar K. Matar, Indranil Pan

**Abstract:** In our previous work, we introduced the rule-based Bayesian Regression, a methodology that leverages two concepts: (i) Bayesian inference, for the general framework and uncertainty quantification and (ii) rule-based systems for the incorporation of expert knowledge and intuition. The resulting method creates a penalty equivalent to a common Bayesian prior, but it also includes information that typically would not be available within a standard Bayesian context. In this work, we extend the aforementioned methodology with grammatical evolution, a symbolic genetic programming technique that we utilise for automating the rules' derivation. Our motivation is that grammatical evolution can potentially detect patterns from the data with valuable information, equivalent to that of expert knowledge. We illustrate the use of the rule-based Evolutionary Bayesian learning technique by applying it to synthetic as well as real data, and examine the results in terms of point predictions and associated uncertainty.

Local and Global GANs with Semantic-Aware Upsampling for Image Generation arxiv:2203.00047 📈 7

Hao Tang, Ling Shao, Philip H. S. Torr, Nicu Sebe

**Abstract:** In this paper, we address the task of semantic-guided image generation. One challenge common to most existing image-level generation methods is the difficulty in generating small objects and detailed local textures. To address this, in this work we consider generating images using local context. As such, we design a local class-specific generative network using semantic maps as guidance, which separately constructs and learns subgenerators for different classes, enabling it to capture finer details. To learn more discriminative class-specific feature representations for the local generation, we also propose a novel classification module. To combine the advantages of both global image-level and local class-specific generation, a joint generation network is designed with an attention fusion module and a dual-discriminator structure embedded. Lastly, we propose a novel semantic-aware upsampling method, which has a larger receptive field and can take far-away pixels that are semantically related for feature upsampling, enabling it to better preserve semantic consistency for instances with the same semantic labels. Extensive experiments on two image generation tasks show the superior performance of the proposed method. State-of-the-art results are established by large margins on both tasks and on nine challenging public benchmarks. The source code and trained models are available at https://github.com/Ha0Tang/LGGAN.

Rectified Max-Value Entropy Search for Bayesian Optimization arxiv:2202.13597 📈 7

Quoc Phong Nguyen, Bryan Kian Hsiang Low, Patrick Jaillet

**Abstract:** Although the existing max-value entropy search (MES) is based on the widely celebrated notion of mutual information, its empirical performance can suffer due to two misconceptions whose implications on the exploration-exploitation trade-off are investigated in this paper. These issues are essential in the development of future acquisition functions and the improvement of the existing ones as they encourage an accurate measure of the mutual information such as the rectified MES (RMES) acquisition function we develop in this work. Unlike the evaluation of MES, we derive a closed-form probability density for the observation conditioned on the max-value and employ stochastic gradient ascent with reparameterization to efficiently optimize RMES. As a result of a more principled acquisition function, RMES shows a consistent improvement over MES in several synthetic function benchmarks and real-world optimization problems.

Interpretable Molecular Graph Generation via Monotonic Constraints arxiv:2203.00412 📈 6

Yuanqi Du, Xiaojie Guo, Amarda Shehu, Liang Zhao

**Abstract:** Designing molecules with specific properties is a long-lasting research problem and is central to advancing crucial domains such as drug discovery and material science. Recent advances in deep graph generative models treat molecule design as graph generation problems which provide new opportunities toward the breakthrough of this long-lasting problem. Existing models, however, have many shortcomings, including poor interpretability and controllability toward desired molecular properties. This paper focuses on new methodologies for molecule generation with interpretable and controllable deep generative models, by proposing new monotonically-regularized graph variational autoencoders. The proposed models learn to represent the molecules with latent variables and then learn the correspondence between them and molecule properties parameterized by polynomial functions. To further improve the intepretability and controllability of molecule generation towards desired properties, we derive new objectives which further enforce monotonicity of the relation between some latent variables and target molecule properties such as toxicity and clogP. Extensive experimental evaluation demonstrates the superiority of the proposed framework on accuracy, novelty, disentanglement, and control towards desired molecular properties. The code is open-source at https://anonymous.4open.science/r/MDVAE-FD2C.

EdgeMixup: Improving Fairness for Skin Disease Classification and Segmentation arxiv:2202.13883 📈 6

Haolin Yuan, Armin Hadzic, William Paul, Daniella Villegas de Flores, Philip Mathew, John Aucott, Yinzhi Cao, Philippe Burlina

**Abstract:** Skin lesions can be an early indicator of a wide range of infectious and other diseases. The use of deep learning (DL) models to diagnose skin lesions has great potential in assisting clinicians with prescreening patients. However, these models often learn biases inherent in training data, which can lead to a performance gap in the diagnosis of people with light and/or dark skin tones. To the best of our knowledge, limited work has been done on identifying, let alone reducing, model bias in skin disease classification and segmentation. In this paper, we examine DL fairness and demonstrate the existence of bias in classification and segmentation models for subpopulations with darker skin tones compared to individuals with lighter skin tones, for specific diseases including Lyme, Tinea Corporis and Herpes Zoster. Then, we propose a novel preprocessing, data alteration method, called EdgeMixup, to improve model fairness with a linear combination of an input skin lesion image and a corresponding a predicted edge detection mask combined with color saturation alteration. For the task of skin disease classification, EdgeMixup outperforms much more complex competing methods such as adversarial approaches, achieving a 10.99% reduction in accuracy gap between light and dark skin tone samples, and resulting in 8.4% improved performance for an underrepresented subpopulation.

Weakly Supervised Disentangled Representation for Goal-conditioned Reinforcement Learning arxiv:2202.13624 📈 6

Zhifeng Qian, Mingyu You, Hongjun Zhou, Bin He

**Abstract:** Goal-conditioned reinforcement learning is a crucial yet challenging algorithm which enables agents to achieve multiple user-specified goals when learning a set of skills in a dynamic environment. However, it typically requires millions of the environmental interactions explored by agents, which is sample-inefficient. In the paper, we propose a skill learning framework DR-GRL that aims to improve the sample efficiency and policy generalization by combining the Disentangled Representation learning and Goal-conditioned visual Reinforcement Learning. In a weakly supervised manner, we propose a Spatial Transform AutoEncoder (STAE) to learn an interpretable and controllable representation in which different parts correspond to different object attributes (shape, color, position). Due to the high controllability of the representations, STAE can simply recombine and recode the representations to generate unseen goals for agents to practice themselves. The manifold structure of the learned representation maintains consistency with the physical position, which is beneficial for reward calculation. We empirically demonstrate that DR-GRL significantly outperforms the previous methods in sample efficiency and policy generalization. In addition, DR-GRL is also easy to expand to the real robot.

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds arxiv:2202.13603 📈 6

Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

**Abstract:** We consider learning a stochastic bandit model, where the reward function belongs to a general class of uniformly bounded functions, and the additive noise can be heteroscedastic. Our model captures contextual linear bandits and generalized linear bandits as special cases. While previous works (Kirschner and Krause, 2018; Zhou et al., 2021) based on weighted ridge regression can deal with linear bandits with heteroscedastic noise, they are not directly applicable to our general model due to the curse of nonlinearity. In order to tackle this problem, we propose a multi-level learning framework for the general bandit model. The core idea of our framework is to partition the observed data into different levels according to the variance of their respective reward and perform online learning at each level collaboratively. Under our framework, we first design an algorithm that constructs the variance-aware confidence set based on empirical risk minimization and prove a variance-dependent regret bound. For generalized linear bandits, we further propose an algorithm based on follow-the-regularized-leader (FTRL) subroutine and online-to-confidence-set conversion, which can achieve a tighter variance-dependent regret under certain conditions.

Private Frequency Estimation via Projective Geometry arxiv:2203.00194 📈 5

Vitaly Feldman, Jelani Nelson, Huy Lê Nguyen, Kunal Talwar

**Abstract:** In this work, we propose a new algorithm ProjectiveGeometryResponse (PGR) for locally differentially private (LDP) frequency estimation. For a universe size of $k$ and with $n$ users, our $\varepsilon$-LDP algorithm has communication cost $\lceil\log_2k\rceil$ bits in the private coin setting and $\varepsilon\log_2 e + O(1)$ in the public coin setting, and has computation cost $O(n + k\exp(\varepsilon) \log k)$ for the server to approximately reconstruct the frequency histogram, while achieving the state-of-the-art privacy-utility tradeoff. In many parameter settings used in practice this is a significant improvement over the $ O(n+k^2)$ computation cost that is achieved by the recent PI-RAPPOR algorithm (Feldman and Talwar; 2021). Our empirical evaluation shows a speedup of over 50x over PI-RAPPOR while using approximately 75x less memory for practically relevant parameter settings. In addition, the running time of our algorithm is within an order of magnitude of HadamardResponse (Acharya, Sun, and Zhang; 2019) and RecursiveHadamardResponse (Chen, Kairouz, and Ozgur; 2020) which have significantly worse reconstruction error. The error of our algorithm essentially matches that of the communication- and time-inefficient but utility-optimal SubsetSelection (SS) algorithm (Ye and Barg; 2017). Our new algorithm is based on using Projective Planes over a finite field to define a small collection of sets that are close to being pairwise independent and a dynamic programming algorithm for approximate histogram reconstruction on the server side. We also give an extension of PGR, which we call HybridProjectiveGeometryResponse, that allows trading off computation time with utility smoothly.

Preemptive Motion Planning for Human-to-Robot Indirect Placement Handovers arxiv:2203.00156 📈 5

Andrew Choi, Mohammad Khalid Jawed, Jungseock Joo

**Abstract:** As technology advances, the need for safe, efficient, and collaborative human-robot-teams has become increasingly important. One of the most fundamental collaborative tasks in any setting is the object handover. Human-to-robot handovers can take either of two approaches: (1) direct hand-to-hand or (2) indirect hand-to-placement-to-pick-up. The latter approach ensures minimal contact between the human and robot but can also result in increased idle time due to having to wait for the object to first be placed down on a surface. To minimize such idle time, the robot must preemptively predict the human intent of where the object will be placed. Furthermore, for the robot to preemptively act in any sort of productive manner, predictions and motion planning must occur in real-time. We introduce a novel prediction-planning pipeline that allows the robot to preemptively move towards the human agent's intended placement location using gaze and gestures as model inputs. In this paper, we investigate the performance and drawbacks of our early intent predictor-planner as well as the practical benefits of using such a pipeline through a human-robot case study.

Predicting the Thermal Sunyaev-Zel'dovich Field using Modular and Equivariant Set-Based Neural Networks arxiv:2203.00026 📈 5

Leander Thiele, Miles Cranmer, William Coulton, Shirley Ho, David N. Spergel

**Abstract:** Theoretical uncertainty limits our ability to extract cosmological information from baryonic fields such as the thermal Sunyaev-Zel'dovich (tSZ) effect. Being sourced by the electron pressure field, the tSZ effect depends on baryonic physics that is usually modeled by expensive hydrodynamic simulations. We train neural networks on the IllustrisTNG-300 cosmological simulation to predict the continuous electron pressure field in galaxy clusters from gravity-only simulations. Modeling clusters is challenging for neural networks as most of the gas pressure is concentrated in a handful of voxels and even the largest hydrodynamical simulations contain only a few hundred clusters that can be used for training. Instead of conventional convolutional neural net (CNN) architectures, we choose to employ a rotationally equivariant DeepSets architecture to operate directly on the set of dark matter particles. We argue that set-based architectures provide distinct advantages over CNNs. For example, we can enforce exact rotational and permutation equivariance, incorporate existing knowledge on the tSZ field, and work with sparse fields as are standard in cosmology. We compose our architecture with separate, physically meaningful modules, making it amenable to interpretation. For example, we can separately study the influence of local and cluster-scale environment, determine that cluster triaxiality has negligible impact, and train a module that corrects for mis-centering. Our model improves by 70 % on analytic profiles fit to the same simulation data. We argue that the electron pressure field, viewed as a function of a gravity-only simulation, has inherent stochasticity, and model this property through a conditional-VAE extension to the network. This modification yields further improvement by 7 %, it is limited by our small training set however. (abridged)

A Proximal Algorithm for Sampling arxiv:2202.13975 📈 5

Jiaming Liang, Yongxin Chen

**Abstract:** We consider sampling problems with possibly non-smooth potentials (negative log-densities). In particular, we study two specific settings of sampling where the convex potential is either semi-smooth or in composite form as the sum of a smooth component and a semi-smooth component. To overcome the challenges caused by the non-smoothness, we propose a Markov chain Monte Carlo algorithm that resembles proximal methods in optimization for these sampling tasks. The key component of our method is a sampling scheme for a quadratically regularized target potential. This scheme relies on rejection sampling with a carefully designed Gaussian proposal whose center is an approximate minimizer of the regularized potential. We develop a novel technique (a modified Gaussian integral) to bound the complexity of this rejection sampling scheme in spite of the non-smoothness in the potentials. We then combine this scheme with the alternating sampling framework (ASF), which uses Gibbs sampling on an augmented distribution, to accomplish the two settings of sampling tasks we consider. Furthermore, by combining the complexity bound of the rejection sampling we develop and the remarkable convergence properties of ASF discovered recently, we are able to establish several non-asymptotic complexity bounds for our algorithm, in terms of the total number of queries of subgradient of the target potential. Our algorithm achieves state-of-the-art complexity bounds compared with all existing methods in the same settings.

Recent Advances and Challenges in Deep Audio-Visual Correlation Learning arxiv:2202.13673 📈 5

Luís Vilaça, Yi Yu, Paula Viana

**Abstract:** Audio-visual correlation learning aims to capture essential correspondences and understand natural phenomena between audio and video. With the rapid growth of deep learning, an increasing amount of attention has been paid to this emerging research issue. Through the past few years, various methods and datasets have been proposed for audio-visual correlation learning, which motivate us to conclude a comprehensive survey. This survey paper focuses on state-of-the-art (SOTA) models used to learn correlations between audio and video, but also discusses some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate some objective functions frequently used for optimizing audio-visual correlation learning models and discuss how audio-visual data is exploited in the optimization process. Most importantly, we provide an extensive comparison and summarization of the recent progress of SOTA audio-visual correlation learning and discuss future research directions.

An Effective Graph Learning based Approach for Temporal Link Prediction: The First Place of WSDM Cup 2022 arxiv:2203.01820 📈 4

Qian Zhao, Shuo Yang, Binbin Hu, Zhiqiang Zhang, Yakun Wang, Yusong Chen, Jun Zhou, Chuan Shi

**Abstract:** Temporal link prediction, as one of the most crucial work in temporal graphs, has attracted lots of attention from the research area. The WSDM Cup 2022 seeks for solutions that predict the existence probabilities of edges within time spans over temporal graph. This paper introduces the solution of AntGraph, which wins the 1st place in the competition. We first analysis the theoretical upper-bound of the performance by removing temporal information, which implies that only structure and attribute information on the graph could achieve great performance. Based on this hypothesis, then we introduce several well-designed features. Finally, experiments conducted on the competition datasets show the superiority of our proposal, which achieved AUC score of 0.666 on dataset A and 0.902 on dataset B, the ablation studies also prove the efficiency of each feature. Code is publicly available at https://github.com/im0qianqian/WSDM2022TGP-AntGraph.

Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings arxiv:2203.00211 📈 4

Neeraj Varshney, Swaroop Mishra, Chitta Baral

**Abstract:** In order to equip NLP systems with selective prediction capability, several task-specific approaches have been proposed. However, which approaches work best across tasks or even if they consistently outperform the simplest baseline 'MaxProb' remains to be explored. To this end, we systematically study 'selective prediction' in a large-scale setup of 17 datasets across several NLP tasks. Through comprehensive experiments under in-domain (IID), out-of-domain (OOD), and adversarial (ADV) settings, we show that despite leveraging additional resources (held-out data/computation), none of the existing approaches consistently and considerably outperforms MaxProb in all three settings. Furthermore, their performance does not translate well across tasks. For instance, Monte-Carlo Dropout outperforms all other approaches on Duplicate Detection datasets but does not fare well on NLI datasets, especially in the OOD setting. Thus, we recommend that future selective prediction approaches should be evaluated across tasks and settings for reliable estimation of their capabilities.

Semi-supervised Deep Learning for Image Classification with Distribution Mismatch: A Survey arxiv:2203.00190 📈 4

Saul Calderon-Ramirez, Shengxiang Yang, David Elizondo

**Abstract:** Deep learning methodologies have been employed in several different fields, with an outstanding success in image recognition applications, such as material quality control, medical imaging, autonomous driving, etc. Deep learning models rely on the abundance of labelled observations to train a prospective model. These models are composed of millions of parameters to estimate, increasing the need of more training observations. Frequently it is expensive to gather labelled observations of data, making the usage of deep learning models not ideal, as the model might over-fit data. In a semi-supervised setting, unlabelled data is used to improve the levels of accuracy and generalization of a model with small labelled datasets. Nevertheless, in many situations different unlabelled data sources might be available. This raises the risk of a significant distribution mismatch between the labelled and unlabelled datasets. Such phenomena can cause a considerable performance hit to typical semi-supervised deep learning frameworks, which often assume that both labelled and unlabelled datasets are drawn from similar distributions. Therefore, in this paper we study the latest approaches for semi-supervised deep learning for image recognition. Emphasis is made in semi-supervised deep learning models designed to deal with a distribution mismatch between the labelled and unlabelled datasets. We address open challenges with the aim to encourage the community to tackle them, and overcome the high data demand of traditional deep learning pipelines under real-world usage settings.

Semantic Sentence Composition Reasoning for Multi-Hop Question Answering arxiv:2203.00160 📈 4

Qianglong Chen

**Abstract:** Due to the lack of insufficient data, existing multi-hop open domain question answering systems require to effectively find out relevant supporting facts according to each question. To alleviate the challenges of semantic factual sentences retrieval and multi-hop context expansion, we present a semantic sentence composition reasoning approach for a multi-hop question answering task, which consists of two key modules: a multi-stage semantic matching module (MSSM) and a factual sentence composition module (FSC). With the combination of factual sentences and multi-stage semantic retrieval, our approach can provide more comprehensive contextual information for model training and reasoning. Experimental results demonstrate our model is able to incorporate existing pre-trained language models and outperform the existing SOTA method on the QASC task with an improvement of about 9%.

MRI-GAN: A Generalized Approach to Detect DeepFakes using Perceptual Image Assessment arxiv:2203.00108 📈 4

Pratikkumar Prajapati, Chris Pollett

**Abstract:** DeepFakes are synthetic videos generated by swapping a face of an original image with the face of somebody else. In this paper, we describe our work to develop general, deep learning-based models to classify DeepFake content. We propose a novel framework for using Generative Adversarial Network (GAN)-based models, we call MRI-GAN, that utilizes perceptual differences in images to detect synthesized videos. We test our MRI-GAN approach and a plain-frames-based model using the DeepFake Detection Challenge Dataset. Our plain frames-based-model achieves 91% test accuracy and a model which uses our MRI-GAN framework with Structural Similarity Index Measurement (SSIM) for the perceptual differences achieves 74% test accuracy. The results of MRI-GAN are preliminary and may be improved further by modifying the choice of loss function, tuning hyper-parameters, or by using a more advanced perceptual similarity metric.

Deep Camera Pose Regression Using Pseudo-LiDAR arxiv:2203.00080 📈 4

Ali Raza, Lazar Lolic, Shahmir Akhter, Alfonso Dela Cruz, Michael Liut

**Abstract:** An accurate and robust large-scale localization system is an integral component for active areas of research such as autonomous vehicles and augmented reality. To this end, many learning algorithms have been proposed that predict 6DOF camera pose from RGB or RGB-D images. However, previous methods that incorporate depth typically treat the data the same way as RGB images, often adding depth maps as additional channels to RGB images and passing them through convolutional neural networks (CNNs). In this paper, we show that converting depth maps into pseudo-LiDAR signals, previously shown to be useful for 3D object detection, is a better representation for camera localization tasks by projecting point clouds that can accurately determine 6DOF camera pose. This is demonstrated by first comparing localization accuracies of a network operating exclusively on pseudo-LiDAR representations, with networks operating exclusively on depth maps. We then propose FusionLoc, a novel architecture that uses pseudo-LiDAR to regress a 6DOF camera pose. FusionLoc is a dual stream neural network, which aims to remedy common issues with typical 2D CNNs operating on RGB-D images. The results from this architecture are compared against various other state-of-the-art deep pose regression implementations using the 7 Scenes dataset. The findings are that FusionLoc performs better than a number of other camera localization methods, with a notable improvement being, on average, 0.33m and 4.35° more accurate than RGB-D PoseNet. By proving the validity of using pseudo-LiDAR signals over depth maps for localization, there are new considerations when implementing large-scale localization systems.

Multi-modal Alignment using Representation Codebook arxiv:2203.00048 📈 4

Jiali Duan, Liqun Chen, Son Tran, Jinyu Yang, Yi Xu, Belinda Zeng, Trishul Chilimbi

**Abstract:** Aligning signals from different modalities is an important step in vision-language representation learning as it affects the performance of later stages such as cross-modality fusion. Since image and text typically reside in different regions of the feature space, directly aligning them at instance level is challenging especially when features are still evolving during training. In this paper, we propose to align at a higher and more stable level using cluster representation. Specifically, we treat image and text as two "views" of the same entity, and encode them into a joint vision-language coding space spanned by a dictionary of cluster centers (codebook). We contrast positive and negative samples via their cluster assignments while simultaneously optimizing the cluster centers. To further smooth out the learning process, we adopt a teacher-student distillation paradigm, where the momentum teacher of one view guides the student learning of the other. We evaluated our approach on common vision language benchmarks and obtain new SoTA on zero-shot cross modality retrieval while being competitive on various other transfer tasks.

A Recurrent Differentiable Engine for Modeling Tensegrity Robots Trainable with Low-Frequency Data arxiv:2203.00041 📈 4

Kun Wang, Mridul Aanjaneya, Kostas Bekris

**Abstract:** Tensegrity robots, composed of rigid rods and flexible cables, are difficult to accurately model and control given the presence of complex dynamics and high number of DoFs. Differentiable physics engines have been recently proposed as a data-driven approach for model identification of such complex robotic systems. These engines are often executed at a high-frequency to achieve accurate simulation. Ground truth trajectories for training differentiable engines, however, are not typically available at such high frequencies due to limitations of real-world sensors. The present work focuses on this frequency mismatch, which impacts the modeling accuracy. We proposed a recurrent structure for a differentiable physics engine of tensegrity robots, which can be trained effectively even with low-frequency trajectories. To train this new recurrent engine in a robust way, this work introduces relative to prior work: (i) a new implicit integration scheme, (ii) a progressive training pipeline, and (iii) a differentiable collision checker. A model of NASA's icosahedron SUPERballBot on MuJoCo is used as the ground truth system to collect training data. Simulated experiments show that once the recurrent differentiable engine has been trained given the low-frequency trajectories from MuJoCo, it is able to match the behavior of MuJoCo's system. The criterion for success is whether a locomotion strategy learned using the differentiable engine can be transferred back to the ground-truth system and result in a similar motion. Notably, the amount of ground truth data needed to train the differentiable engine, such that the policy is transferable to the ground truth system, is 1% of the data needed to train the policy directly on the ground-truth system.

Spatio-temporal Vision Transformer for Super-resolution Microscopy arxiv:2203.00030 📈 4

Charles N. Christensen, Meng Lu, Edward N. Ward, Pietro Lio, Clemens F. Kaminski

**Abstract:** Structured illumination microscopy (SIM) is an optical super-resolution technique that enables live-cell imaging beyond the diffraction limit. Reconstruction of SIM data is prone to artefacts, which becomes problematic when imaging highly dynamic samples because previous methods rely on the assumption that samples are static. We propose a new transformer-based reconstruction method, VSR-SIM, that uses shifted 3-dimensional window multi-head attention in addition to channel attention mechanism to tackle the problem of video super-resolution (VSR) in SIM. The attention mechanisms are found to capture motion in sequences without the need for common motion estimation techniques such as optical flow. We take an approach to training the network that relies solely on simulated data using videos of natural scenery with a model for SIM image formation. We demonstrate a use case enabled by VSR-SIM referred to as rolling SIM imaging, which increases temporal resolution in SIM by a factor of 9. Our method can be applied to any SIM setup enabling precise recordings of dynamic processes in biomedical research with high temporal resolution.

ParaNames: A Massively Multilingual Entity Name Corpus arxiv:2202.14035 📈 4

Jonne Sälevä, Constantine Lignos

**Abstract:** This preprint describes work in progress on ParaNames, a multilingual parallel name resource consisting of names for approximately 14 million entities. The included names span over 400 languages, and almost all entities are mapped to standardized entity types (PER/LOC/ORG). Using Wikidata as a source, we create the largest resource of this type to-date. We describe our approach to filtering and standardizing the data to provide the best quality possible. ParaNames is useful for multilingual language processing, both in defining tasks for name translation/transliteration and as supplementary data for tasks such as named entity recognition and linking. Our resource is released on GitHub (https://github.com/bltlab/paranames) under a Creative Commons license (CC BY 4.0).

Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment arxiv:2202.14019 📈 4

Paritosh Parmar, Amol Gharat, Helge Rhodin

**Abstract:** Maintaining proper form while exercising is important for preventing injuries and maximizing muscle mass gains. While fitness apps are becoming popular, they lack the functionality to detect errors in workout form. Detecting such errors naturally requires estimating users' body pose. However, off-the-shelf pose estimators struggle to perform well on the videos recorded in gym scenarios due to factors such as camera angles, occlusion from gym equipment, illumination, and clothing. To aggravate the problem, the errors to be detected in the workouts are very subtle. To that end, we propose to learn exercise-specific representations from unlabeled samples such that a small dataset annotated by experts suffices for supervised error detection. In particular, our domain knowledge-informed self-supervised approaches exploit the harmonic motion of the exercise actions, and capitalize on the large variances in camera angles, clothes, and illumination to learn powerful representations. To facilitate our self-supervised pretraining, and supervised finetuning, we curated a new exercise dataset, Fitness-AQA, comprising of three exercises: BackSquat, BarbellRow, and OverheadPress. It has been annotated by expert trainers for multiple crucial and typically occurring exercise errors. Experimental results show that our self-supervised representations outperform off-the-shelf 2D- and 3D-pose estimators and several other baselines.

Precision-medicine-toolbox: An open-source python package for facilitation of quantitative medical imaging and radiomics analysis arxiv:2202.13965 📈 4

Sergey Primakov, Elizaveta Lavrova, Zohaib Salahuddin, Henry C Woodruff, Philippe Lambin

**Abstract:** Medical image analysis plays a key role in precision medicine as it allows the clinicians to identify anatomical abnormalities and it is routinely used in clinical assessment. Data curation and pre-processing of medical images are critical steps in the quantitative medical image analysis that can have a significant impact on the resulting model performance. In this paper, we introduce a precision-medicine-toolbox that allows researchers to perform data curation, image pre-processing and handcrafted radiomics extraction (via Pyradiomics) and feature exploration tasks with Python. With this open-source solution, we aim to address the data preparation and exploration problem, bridge the gap between the currently existing packages, and improve the reproducibility of quantitative medical imaging research.

Background Mixup Data Augmentation for Hand and Object-in-Contact Detection arxiv:2202.13941 📈 4

Koya Tango, Takehiko Ohkawa, Ryosuke Furuta, Yoichi Sato

**Abstract:** Detecting the positions of human hands and objects-in-contact (hand-object detection) in each video frame is vital for understanding human activities from videos. For training an object detector, a method called Mixup, which overlays two training images to mitigate data bias, has been empirically shown to be effective for data augmentation. However, in hand-object detection, mixing two hand-manipulation images produces unintended biases, e.g., the concentration of hands and objects in a specific region degrades the ability of the hand-object detector to identify object boundaries. We propose a data-augmentation method called Background Mixup that leverages data-mixing regularization while reducing the unintended effects in hand-object detection. Instead of mixing two images where a hand and an object in contact appear, we mix a target training image with background images without hands and objects-in-contact extracted from external image sources, and use the mixed images for training the detector. Our experiments demonstrated that the proposed method can effectively reduce false positives and improve the performance of hand-object detection in both supervised and semi-supervised learning settings.

Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity arxiv:2202.13890 📈 4

Laixi Shi, Gen Li, Yuting Wei, Yuxin Chen, Yuejie Chi

**Abstract:** Offline or batch reinforcement learning seeks to learn a near-optimal policy using history data without active exploration of the environment. To counter the insufficient coverage and sample scarcity of many offline datasets, the principle of pessimism has been recently introduced to mitigate high bias of the estimated values. While pessimistic variants of model-based algorithms (e.g., value iteration with lower confidence bounds) have been theoretically investigated, their model-free counterparts -- which do not require explicit model estimation -- have not been adequately studied, especially in terms of sample efficiency. To address this inadequacy, we study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes, and characterize its sample complexity under the single-policy concentrability assumption which does not require the full coverage of the state-action space. In addition, a variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity. Altogether, this work highlights the efficiency of model-free algorithms in offline RL when used in conjunction with pessimism and variance reduction.

On the Robustness of CountSketch to Adaptive Inputs arxiv:2202.13736 📈 4

Edith Cohen, Xin Lyu, Jelani Nelson, Tamás Sarlós, Moshe Shechner, Uri Stemmer

**Abstract:** CountSketch is a popular dimensionality reduction technique that maps vectors to a lower dimension using randomized linear measurements. The sketch supports recovering $\ell_2$-heavy hitters of a vector (entries with $v[i]^2 \geq \frac{1}{k}\|\boldsymbol{v}\|^2_2$). We study the robustness of the sketch in adaptive settings where input vectors may depend on the output from prior inputs. Adaptive settings arise in processes with feedback or with adversarial attacks. We show that the classic estimator is not robust, and can be attacked with a number of queries of the order of the sketch size. We propose a robust estimator (for a slightly modified sketch) that allows for quadratic number of queries in the sketch size, which is an improvement factor of $\sqrt{k}$ (for $k$ heavy hitters) over prior work.

Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation arxiv:2202.13663 📈 4

Chulun Zhou, Fandong Meng, Jie Zhou, Min Zhang, Hongji Wang, Jinsong Su

**Abstract:** Most dominant neural machine translation (NMT) models are restricted to make predictions only according to the local context of preceding words in a left-to-right manner. Although many previous studies try to incorporate global information into NMT models, there still exist limitations on how to effectively exploit bidirectional global context. In this paper, we propose a Confidence Based Bidirectional Global Context Aware (CBBGCA) training framework for NMT, where the NMT model is jointly trained with an auxiliary conditional masked language model (CMLM). The training consists of two stages: (1) multi-task joint training; (2) confidence based knowledge distillation. At the first stage, by sharing encoder parameters, the NMT model is additionally supervised by the signal from the CMLM decoder that contains bidirectional global contexts. Moreover, at the second stage, using the CMLM as teacher, we further pertinently incorporate bidirectional global context to the NMT model on its unconfidently-predicted target words via knowledge distillation. Experimental results show that our proposed CBBGCA training framework significantly improves the NMT model by +1.02, +1.30 and +0.57 BLEU scores on three large-scale translation datasets, namely WMT'14 English-to-German, WMT'19 Chinese-to-English and WMT'14 English-to-French, respectively.

Avalanche RL: a Continual Reinforcement Learning Library arxiv:2202.13657 📈 4

Nicolò Lucchesi, Antonio Carta, Vincenzo Lomonaco

**Abstract:** Continual Reinforcement Learning (CRL) is a challenging setting where an agent learns to interact with an environment that is constantly changing over time (the stream of experiences). In this paper, we describe Avalanche RL, a library for Continual Reinforcement Learning which allows to easily train agents on a continuous stream of tasks. Avalanche RL is based on PyTorch and supports any OpenAI Gym environment. Its design is based on Avalanche, one of the more popular continual learning libraries, which allow us to reuse a large number of continual learning strategies and improve the interaction between reinforcement learning and continual learning researchers. Additionally, we propose Continual Habitat-Lab, a novel benchmark and a high-level library which enables the usage of the photorealistic simulator Habitat-Sim for CRL research. Overall, Avalanche RL attempts to unify under a common framework continual reinforcement learning applications, which we hope will foster the growth of the field.

Hierarchical Multi-Agent DRL-Based Framework for Joint Multi-RAT Assignment and Dynamic Resource Allocation in Next-Generation HetNets arxiv:2202.13652 📈 4

Abdulmalik Alwarafy, Bekir Sait Ciftler, Mohamed Abdallah, Mounir Hamdi, Naofal Al-Dhahir

**Abstract:** This paper considers the problem of cost-aware downlink sum-rate maximization via joint optimal radio access technologies (RATs) assignment and power allocation in next-generation heterogeneous wireless networks (HetNets). We consider a future HetNet comprised of multi-RATs and serving multi-connectivity edge devices (EDs), and we formulate the problem as mixed-integer non-linear programming (MINP) problem. Due to the high complexity and combinatorial nature of this problem and the difficulty to solve it using conventional methods, we propose a hierarchical multi-agent deep reinforcement learning (DRL)-based framework, called DeepRAT, to solve it efficiently and learn system dynamics. In particular, the DeepRAT framework decomposes the problem into two main stages; the RATs-EDs assignment stage, which implements a single-agent Deep Q Network (DQN) algorithm, and the power allocation stage, which utilizes a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm. Using simulations, we demonstrate how the various DRL agents efficiently interact to learn system dynamics and derive the global optimal policy. Furthermore, our simulation results show that the proposed DeepRAT algorithm outperforms existing state-of-the-art heuristic approaches in terms of network utility. Finally, we quantitatively show the ability of the DeepRAT model to quickly and dynamically adapt to abrupt changes in network dynamics, such as EDs mobility.

Towards A Device-Independent Deep Learning Approach for the Automated Segmentation of Sonographic Fetal Brain Structures: A Multi-Center and Multi-Device Validation arxiv:2202.13553 📈 4

Abhi Lad, Adithya Narayan, Hari Shankar, Shefali Jain, Pooja Punjani Vyas, Divya Singh, Nivedita Hegde, Jagruthi Atada, Jens Thang, Saw Shier Nee, Arunkumar Govindarajan, Roopa PS, Muralidhar V Pai, Akhila Vasudeva, Prathima Radhakrishnan, Sripad Krishna Devalla

**Abstract:** Quality assessment of prenatal ultrasonography is essential for the screening of fetal central nervous system (CNS) anomalies. The interpretation of fetal brain structures is highly subjective, expertise-driven, and requires years of training experience, limiting quality prenatal care for all pregnant mothers. With recent advancement in Artificial Intelligence (AI), specifically deep learning (DL), assistance in precise anatomy identification through semantic segmentation essential for the reliable assessment of growth and neurodevelopment, and detection of structural abnormalities have been proposed. However, existing works only identify certain structures (e.g., cavum septum pellucidum, lateral ventricles, cerebellum) from either of the axial views (transventricular, transcerebellar), limiting the scope for a thorough anatomical assessment as per practice guidelines necessary for the screening of CNS anomalies. Further, existing works do not analyze the generalizability of these DL algorithms across images from multiple ultrasound devices and centers, thus, limiting their real-world clinical impact. In this study, we propose a DL based segmentation framework for the automated segmentation of 10 key fetal brain structures from 2 axial planes from fetal brain USG images (2D). We developed a custom U-Net variant that uses inceptionv4 block as a feature extractor and leverages custom domain-specific data augmentation. Quantitatively, the mean (10 structures; test sets 1/2/3/4) Dice-coefficients were: 0.827, 0.802, 0.731, 0.783. Irrespective of the USG device/center, the DL segmentations were qualitatively comparable to their manual segmentations. The proposed DL system offered a promising and generalizable performance (multi-centers, multi-device) and also presents evidence in support of device-induced variation in image quality (a challenge to generalizibility) by using UMAP analysis.

Single-shot self-supervised particle tracking arxiv:2202.13546 📈 4

Benjamin Midtvedt, Jesús Pineda, Fredrik Skärberg, Erik Olsén, Harshith Bachimanchi, Emelie Wesén, Elin K. Esbjörner, Erik Selander, Fredrik Höök, Daniel Midtvedt, Giovanni Volpe

**Abstract:** Particle tracking is a fundamental task in digital microscopy. Recently, machine-learning approaches have made great strides in overcoming the limitations of more classical approaches. The training of state-of-the-art machine-learning methods almost universally relies on either vast amounts of labeled experimental data or the ability to numerically simulate realistic datasets. However, the data produced by experiments are often challenging to label and cannot be easily reproduced numerically. Here, we propose a novel deep-learning method, named LodeSTAR (Low-shot deep Symmetric Tracking And Regression), that learns to tracks objects with sub-pixel accuracy from a single unlabeled experimental image. This is made possible by exploiting the inherent roto-translational symmetries of the data. We demonstrate that LodeSTAR outperforms traditional methods in terms of accuracy. Furthermore, we analyze challenging experimental data containing densely packed cells or noisy backgrounds. We also exploit additional symmetries to extend the measurable particle properties to the particle's vertical position by propagating the signal in Fourier space and its polarizability by scaling the signal strength. Thanks to the ability to train deep-learning models with a single unlabeled image, LodeSTAR can accelerate the development of high-quality microscopic analysis pipelines for engineering, biology, and medicine.

A Probabilistic Deep Image Prior for Computational Tomography arxiv:2203.00479 📈 3

Javier Antorán, Riccardo Barbano, Johannes Leuschner, José Miguel Hernández-Lobato, Bangti Jin

**Abstract:** Existing deep-learning based tomographic image reconstruction methods do not provide accurate estimates of reconstruction uncertainty, hindering their real-world deployment. To address this limitation, we construct a Bayesian prior for tomographic reconstruction, which combines the classical total variation (TV) regulariser with the modern deep image prior (DIP). Specifically, we use a change of variables to connect our prior beliefs on the image TV semi-norm with the hyper-parameters of the DIP network. For the inference, we develop an approach based on the linearised Laplace method, which is scalable to high-dimensional settings. The resulting framework provides pixel-wise uncertainty estimates and a marginal likelihood objective for hyperparameter optimisation. We demonstrate the method on synthetic and real-measured high-resolution $μ$CT data, and show that it provides superior calibration of uncertainty estimates relative to previous probabilistic formulations of the DIP.

Realtime strategy for image data labelling using binary models and active sampling arxiv:2203.00439 📈 3

Ankush Deshmukh, Bhargava B C, A V Narasimhadhan

**Abstract:** Machine learning (ML) and Deep Learning (DL) tasks primarily depend on data. Most of the ML and DL applications involve supervised learning which requires labelled data. In the initial phases of ML realm lack of data used to be a problem, now we are in a new era of big data. The supervised ML algorithms require data to be labelled and of good quality. Labelling task requires a large amount of money and time investment. Data labelling require a skilled person who will charge high for this task, consider the case of the medical field or the data is in bulk that requires a lot of people assigned to label it. The amount of data that is well enough for training needs to be known, money and time can not be wasted to label the whole data. This paper mainly aims to propose a strategy that helps in labelling the data along with oracle in real-time. With balancing on model contribution for labelling is 89 and 81.1 for furniture type and intel scene image data sets respectively. Further with balancing being kept off model contribution is found to be 83.47 and 78.71 for furniture type and flower data sets respectively.

Understanding the Challenges When 3D Semantic Segmentation Faces Class Imbalanced and OOD Data arxiv:2203.00214 📈 3

Yancheng Pan, Fan Xie, Huijing Zhao

**Abstract:** 3D semantic segmentation (3DSS) is an essential process in the creation of a safe autonomous driving system. However, deep learning models for 3D semantic segmentation often suffer from the class imbalance problem and out-of-distribution (OOD) data. In this study, we explore how the class imbalance problem affects 3DSS performance and whether the model can detect the category prediction correctness, or whether data is ID (in-distribution) or OOD. For these purposes, we conduct two experiments using three representative 3DSS models and five trust scoring methods, and conduct both a confusion and feature analysis of each class. Furthermore, a data augmentation method for the 3D LiDAR dataset is proposed to create a new dataset based on SemanticKITTI and SemanticPOSS, called AugKITTI. We propose the wPre metric and TSD for a more in-depth analysis of the results, and follow are proposals with an insightful discussion. Based on the experimental results, we find that: (1) the classes are not only imbalanced in their data size but also in the basic properties of each semantic category. (2) The intraclass diversity and interclass ambiguity make class learning difficult and greatly limit the models' performance, creating the challenges of semantic and data gaps. (3) The trust scores are unreliable for classes whose features are confused with other classes. For 3DSS models, those misclassified ID classes and OODs may also be given high trust scores, making the 3DSS predictions unreliable, and leading to the challenges in judging 3DSS result trustworthiness. All of these outcomes point to several research directions for improving the performance and reliability of the 3DSS models used for real-world applications.

Layer Adaptive Deep Neural Networks for Out-of-distribution Detection arxiv:2203.00192 📈 3

Haoliang Wang, Chen Zhao, Xujiang Zhao, Feng Chen

**Abstract:** During the forward pass of Deep Neural Networks (DNNs), inputs gradually transformed from low-level features to high-level conceptual labels. While features at different layers could summarize the important factors of the inputs at varying levels, modern out-of-distribution (OOD) detection methods mostly focus on utilizing their ending layer features. In this paper, we proposed a novel layer-adaptive OOD detection framework (LA-OOD) for DNNs that can fully utilize the intermediate layers' outputs. Specifically, instead of training a unified OOD detector at a fixed ending layer, we train multiple One-Class SVM OOD detectors simultaneously at the intermediate layers to exploit the full spectrum characteristics encoded at varying depths of DNNs. We develop a simple yet effective layer-adaptive policy to identify the best layer for detecting each potential OOD example. LA-OOD can be applied to any existing DNNs and does not require access to OOD samples during the training. Using three DNNs of varying depth and architectures, our experiments demonstrate that LA-OOD is robust against OODs of varying complexity and can outperform state-of-the-art competitors by a large margin on some real-world datasets.

Robots Autonomously Detecting People: A Multimodal Deep Contrastive Learning Method Robust to Intraclass Variations arxiv:2203.00187 📈 3

Angus Fung, Beno Benhabib, Goldie Nejat

**Abstract:** Robotic detection of people in crowded and/or cluttered human-centered environments including hospitals, long-term care, stores and airports is challenging as people can become occluded by other people or objects, and deform due to variations in clothing or pose. There can also be loss of discriminative visual features due to poor lighting. In this paper, we present a novel multimodal person detection architecture to address the mobile robot problem of person detection under intraclass variations. We present a two-stage training approach using 1) a unique pretraining method we define as Temporal Invariant Multimodal Contrastive Learning (TimCLR), and 2) a Multimodal Faster R-CNN (MFRCNN) detector. TimCLR learns person representations that are invariant under intraclass variations through unsupervised learning. Our approach is unique in that it generates image pairs from natural variations within multimodal image sequences, in addition to synthetic data augmentation, and contrasts crossmodal features to transfer invariances between different modalities. These pretrained features are used by the MFRCNN detector for finetuning and person detection from RGB-D images. Extensive experiments validate the performance of our DL architecture in both human-centered crowded and cluttered environments. Results show that our method outperforms existing unimodal and multimodal person detection approaches in terms of detection accuracy in detecting people with body occlusions and pose deformations in different lighting conditions.

Nuclear Segmentation and Classification Model with Imbalanced Classes for CoNiC Challenge arxiv:2203.00171 📈 3

Jijun Cheng, Xipeng Pan, Feihu Hou, Bingchao Zhao, Jiatai Lin, Zhenbing Liu, Zaiyi Liu, Chu Han

**Abstract:** Nuclear segmentation and classification is an essential step for computational pathology. TIA lab from Warwick University organized a nuclear segmentation and classification challenge (CoNiC) for H&E stained histopathology images in colorectal cancer based on the Lizard dataset. In this challenge, computer algorithms should be able to segment and recognize six types of nuclei, including Epithelial, Lymphocyte, Plasma, Eosinophil, Neutrophil, Connective tissue. This challenge introduces two highly correlated tasks, nuclei segmentation and classification task and prediction of cellular composition task. There are a few obstacles we have to address in this challenge, 1) imbalanced annotations with few training samples on minority classes, 2) color variation of the images from multiple centers or scanners, 3) limited training samples, 4) similar morphological appearance among classes. To deal with these challenges, we proposed a systematic pipeline for nuclear segmentation and classification. First, we built a GAN-based model to automatically generate pseudo images for data augmentation. Then we trained a self-supervised stain normalization model to solve the color variation problem. Next we constructed a baseline model HoVer-Net with cost-sensitive loss to encourage the model pay more attention on the minority classes. According to the results of the leaderboard, our proposed pipeline achieves 0.40665 mPQ+ (Rank 33rd) and 0.62199 r2 (Rank 4th) in the preliminary test phase.

Learning Cross-Video Neural Representations for High-Quality Frame Interpolation arxiv:2203.00137 📈 3

Wentao Shangguan, Yu Sun, Weijie Gan, Ulugbek S. Kamilov

**Abstract:** This paper considers the problem of temporal video interpolation, where the goal is to synthesize a new video frame given its two neighbors. We propose Cross-Video Neural Representation (CURE) as the first video interpolation method based on neural fields (NF). NF refers to the recent class of methods for the neural representation of complex 3D scenes that has seen widespread success and application across computer vision. CURE represents the video as a continuous function parameterized by a coordinate-based neural network, whose inputs are the spatiotemporal coordinates and outputs are the corresponding RGB values. CURE introduces a new architecture that conditions the neural network on the input frames for imposing space-time consistency in the synthesized video. This not only improves the final interpolation quality, but also enables CURE to learn a prior across multiple videos. Experimental evaluations show that CURE achieves the state-of-the-art performance on video interpolation on several benchmark datasets.

Setting Fair Incentives to Maximize Improvement arxiv:2203.00134 📈 3

Saba Ahmadi, Hedyeh Beyhaghi, Avrim Blum, Keziah Naggita

**Abstract:** We consider the problem of helping agents improve by setting short-term goals. Given a set of target skill levels, we assume each agent will try to improve from their initial skill level to the closest target level within reach or do nothing if no target level is within reach. We consider two models: the common improvement capacity model, where agents have the same limit on how much they can improve, and the individualized improvement capacity model, where agents have individualized limits. Our goal is to optimize the target levels for social welfare and fairness objectives, where social welfare is defined as the total amount of improvement, and fairness objectives are considered where the agents belong to different underlying populations. A key technical challenge of this problem is the non-monotonicity of social welfare in the set of target levels, i.e., adding a new target level may decrease the total amount of improvement as it may get easier for some agents to improve. This is especially challenging when considering multiple groups because optimizing target levels in isolation for each group and outputting the union may result in arbitrarily low improvement for a group, failing the fairness objective. Considering these properties, we provide algorithms for optimal and near-optimal improvement for both social welfare and fairness objectives. These algorithmic results work for both the common and individualized improvement capacity models. Furthermore, we show a placement of target levels exists that is approximately optimal for the social welfare of each group. Unlike the algorithmic results, this structural statement only holds in the common improvement capacity model, and we show counterexamples in the individualized improvement capacity model. Finally, we extend our algorithms to learning settings where we have only sample access to the initial skill levels of agents.

On classification of strategic agents who can both game and improve arxiv:2203.00124 📈 3

Saba Ahmadi, Hedyeh Beyhaghi, Avrim Blum, Keziah Naggita

**Abstract:** In this work, we consider classification of agents who can both game and improve. For example, people wishing to get a loan may be able to take some actions that increase their perceived credit-worthiness and others that also increase their true credit-worthiness. A decision-maker would like to define a classification rule with few false-positives (does not give out many bad loans) while yielding many true positives (giving out many good loans), which includes encouraging agents to improve to become true positives if possible. We consider two models for this problem, a general discrete model and a linear model, and prove algorithmic, learning, and hardness results for each. For the general discrete model, we give an efficient algorithm for the problem of maximizing the number of true positives subject to no false positives, and show how to extend this to a partial-information learning setting. We also show hardness for the problem of maximizing the number of true positives subject to a nonzero bound on the number of false positives, and that this hardness holds even for a finite-point version of our linear model. We also show that maximizing the number of true positives subject to no false positive is NP-hard in our full linear model. We additionally provide an algorithm that determines whether there exists a linear classifier that classifies all agents accurately and causes all improvable agents to become qualified, and give additional results for low-dimensional data.

Effectiveness of Delivered Information Trade Study arxiv:2203.00116 📈 3

Matthew Ciolino

**Abstract:** The sensor to shooter timeline is affected by two main variables: satellite positioning and asset positioning. Speeding up satellite positioning by adding more sensors or by decreasing processing time is important only if there is a prepared shooter, otherwise the main source of time is getting the shooter into position. However, the intelligence community should work towards the exploitation of sensors to the highest speed and effectiveness possible. Achieving a high effectiveness while keeping speed high is a tradeoff that must be considered in the sensor to shooter timeline. In this paper we investigate two main ideas, increasing the effectiveness of satellite imagery through image manipulation and how on-board image manipulation would affect the sensor to shooter timeline. We cover these ideas in four scenarios: Discrete Event Simulation of onboard processing versus ground station processing, quality of information with cloud cover removal, information improvement with super resolution, and data reduction with image to caption. This paper will show how image manipulation techniques such as Super Resolution, Cloud Removal, and Image to Caption will improve the quality of delivered information in addition to showing how those processes effect the sensor to shooter timeline.

Estimating causal effects with optimization-based methods: A review and empirical comparison arxiv:2203.00097 📈 3

Martin Cousineau, Vedat Verter, Susan A. Murphy, Joelle Pineau

**Abstract:** In the absence of randomized controlled and natural experiments, it is necessary to balance the distributions of (observable) covariates of the treated and control groups in order to obtain an unbiased estimate of a causal effect of interest; otherwise, a different effect size may be estimated, and incorrect recommendations may be given. To achieve this balance, there exist a wide variety of methods. In particular, several methods based on optimization models have been recently proposed in the causal inference literature. While these optimization-based methods empirically showed an improvement over a limited number of other causal inference methods in their relative ability to balance the distributions of covariates and to estimate causal effects, they have not been thoroughly compared to each other and to other noteworthy causal inference methods. In addition, we believe that there exist several unaddressed opportunities that operational researchers could contribute with their advanced knowledge of optimization, for the benefits of the applied researchers that use causal inference tools. In this review paper, we present an overview of the causal inference literature and describe in more detail the optimization-based causal inference methods, provide a comparative analysis of the prevailing optimization-based methods, and discuss opportunities for new methods.

Towards Targeted Change Detection with Heterogeneous Remote Sensing Images for Forest Mortality Mapping arxiv:2203.00049 📈 3

Jørgen A. Agersborg, Luigi T. Luppino, Stian Normann Anfinsen, Jane Uhd Jepsen

**Abstract:** In this paper we develop a method for mapping forest mortality in the forest-tundra ecotone using satellite data from heterogeneous sensors. We use medium resolution imagery in order to provide the complex pattern of forest mortality in this sparsely forested area, which has been induced by an outbreak of geometrid moths. Specifically, Landsat-5 Thematic Mapper images from before the event are used, with RADARSAT-2 providing the post-event images. We obtain the difference images for both multispectral optical and synthetic aperture radar (SAR) by using a recently developed deep learning method for translating between the two domains. These differences are stacked with the original pre- and post-event images in order to let our algorithm also learn how the areas appear before and after the change event. By doing this, and focusing on learning only the changes of interest with one-class classification (OCC), we obtain good results with very little training data.

The complexity of quantum support vector machines arxiv:2203.00031 📈 3

Gian Gentinetta, Arne Thomsen, David Sutter, Stefan Woerner

**Abstract:** Quantum support vector machines employ quantum circuits to define the kernel function. It has been shown that this approach offers a provable exponential speedup compared to any known classical algorithm for certain data sets. The training of such models corresponds to solving a convex optimization problem either via its primal or dual formulation. Due to the probabilistic nature of quantum mechanics, the training algorithms are affected by statistical uncertainty, which has a major impact on their complexity. We show that the dual problem can be solved in $\mathcal{O}(M^{4.67}/\varepsilon^2)$ quantum circuit evaluations, where $M$ denotes the size of the data set and $\varepsilon$ the solution accuracy. We prove under an empirically motivated assumption that the kernelized primal problem can alternatively be solved in $\mathcal{O}(\min \{ M^2/\varepsilon^6, \, 1/\varepsilon^{10} \})$ evaluations by employing a generalization of a known classical algorithm called Pegasos. Accompanying empirical results demonstrate these analytical complexities to be essentially tight. In addition, we investigate a variational approximation to quantum support vector machines and show that their heuristic training achieves considerably better scaling in our experiments.

SUNet: Swin Transformer UNet for Image Denoising arxiv:2202.14009 📈 3

Chi-Mao Fan, Tsung-Jung Liu, Kuan-Hsien Liu

**Abstract:** Image restoration is a challenging ill-posed problem which also has been a long-standing issue. In the past few years, the convolution neural networks (CNNs) almost dominated the computer vision and had achieved considerable success in different levels of vision tasks including image restoration. However, recently the Swin Transformer-based model also shows impressive performance, even surpasses the CNN-based methods to become the state-of-the-art on high-level vision tasks. In this paper, we proposed a restoration model called SUNet which uses the Swin Transformer layer as our basic block and then is applied to UNet architecture for image denoising. The source code and pre-trained models are available at https://github.com/FanChiMao/SUNet.

Risk-Neutral Market Simulation arxiv:2202.13996 📈 3

Magnus Wiese, Phillip Murray

**Abstract:** We develop a risk-neutral spot and equity option market simulator for a single underlying, under which the joint market process is a martingale. We leverage an efficient low-dimensional representation of the market which preserves no static arbitrage, and employ neural spline flows to simulate samples which are free from conditional drifts and are highly realistic in the sense that among all possible risk-neutral simulators, the obtained risk-neutral simulator is the closest to the historical data with respect to the Kullback-Leibler divergence. Numerical experiments demonstrate the effectiveness and highlight both drift removal and fidelity of the calibrated simulator.

Defect detection and segmentation in X-Ray images of magnesium alloy castings using the Detectron2 framework arxiv:2202.13945 📈 3

Francisco Javier Yagüe, Jose Francisco Diez-Pastor, Pedro Latorre-Carmona, Cesar Ignacio Garcia Osorio

**Abstract:** New production techniques have emerged that have made it possible to produce metal parts with more complex shapes, making the quality control process more difficult. This implies that the visual and superficial analysis has become even more inefficient. On top of that, it is also not possible to detect internal defects that these parts could have. The use of X-Ray images has made this process much easier, allowing not only to detect superficial defects in a much simpler way, but also to detect welding or casting defects that could represent a serious hazard for the physical integrity of the metal parts. On the other hand, the use of an automatic segmentation approach for detecting defects would help diminish the dependence of defect detection on the subjectivity of the factory operators and their time dependence variability. The aim of this paper is to apply a deep learning system based on Detectron2, a state-of-the-art library applied to object detection and segmentation in images, for the identification and segmentation of these defects on X-Ray images obtained mainly from automotive parts

Functional mixture-of-experts for classification arxiv:2202.13934 📈 3

Nhat Thien Pham, Faicel Chamroukhi

**Abstract:** We develop a mixtures-of-experts (ME) approach to the multiclass classification where the predictors are univariate functions. It consists of a ME model in which both the gating network and the experts network are constructed upon multinomial logistic activation functions with functional inputs. We perform a regularized maximum likelihood estimation in which the coefficient functions enjoy interpretable sparsity constraints on targeted derivatives. We develop an EM-Lasso like algorithm to compute the regularized MLE and evaluate the proposed approach on simulated and real data.

The Causal Marginal Polytope for Bounding Treatment Effects arxiv:2202.13851 📈 3

Jakob Zeitler, Ricardo Silva

**Abstract:** Due to unmeasured confounding, it is often not possible to identify causal effects from a postulated model. Nevertheless, we can ask for partial identification, which usually boils down to finding upper and lower bounds of a causal quantity of interest derived from all solutions compatible with the encoded structural assumptions. One appealing way to derive such bounds is by casting it in terms of a constrained optimization method that searches over all causal models compatible with evidence, as introduced in the classic work of Balke and Pearl (1994) for discrete data. Although by construction this guarantees tight bounds, it poses a formidable computational challenge. To cope with this issue, alternatives include algorithms that are not guaranteed to be tight, or by introducing restrictions on the class of models. In this paper, we introduce a novel alternative: inspired by ideas coming from belief propagation, we enforce compatibility between marginals of a causal model and data, without constructing a global causal model. We call this collection of locally consistent marginals the causal marginal polytope. As global independence constraints disappear when considering small dimensional tractable marginals, this also leads to a rethinking of how to elicit and express causal knowledge. We provide an explicit algorithm and implementation of this idea, and assess its practicality with numerical experiments.

RestainNet: a self-supervised digital re-stainer for stain normalization arxiv:2202.13804 📈 3

Bingchao Zhao, Jiatai Lin, Changhong Liang, Zongjian Yi, Xin Chen, Bingbing Li, Weihao Qiu, Danyi Li, Li Liang, Chu Han, Zaiyi Liu

**Abstract:** Color inconsistency is an inevitable challenge in computational pathology, which generally happens because of stain intensity variations or sections scanned by different scanners. It harms the pathological image analysis methods, especially the learning-based models. A series of approaches have been proposed for stain normalization. However, most of them are lack flexibility in practice. In this paper, we formulated stain normalization as a digital re-staining process and proposed a self-supervised learning model, which is called RestainNet. Our network is regarded as a digital restainer which learns how to re-stain an unstained (grayscale) image. Two digital stains, Hematoxylin (H) and Eosin (E) were extracted from the original image by Beer-Lambert's Law. We proposed a staining loss to maintain the correctness of stain intensity during the restaining process. Thanks to the self-supervised nature, paired training samples are no longer necessary, which demonstrates great flexibility in practical usage. Our RestainNet outperforms existing approaches and achieves state-of-the-art performance with regard to color correctness and structure preservation. We further conducted experiments on the segmentation and classification tasks and the proposed RestainNet achieved outstanding performance compared with SOTA methods. The self-supervised design allows the network to learn any staining style with no extra effort.

Selection, Ignorability and Challenges With Causal Fairness arxiv:2202.13774 📈 3

Jake Fawkes, Robin Evans, Dino Sejdinovic

**Abstract:** In this paper we look at popular fairness methods that use causal counterfactuals. These methods capture the intuitive notion that a prediction is fair if it coincides with the prediction that would have been made if someone's race, gender or religion were counterfactually different. In order to achieve this, we must have causal models that are able to capture what someone would be like if we were to counterfactually change these traits. However, we argue that any model that can do this must lie outside the particularly well behaved class that is commonly considered in the fairness literature. This is because in fairness settings, models in this class entail a particularly strong causal assumption, normally only seen in a randomised controlled trial. We argue that in general this is unlikely to hold. Furthermore, we show in many cases it can be explicitly rejected due to the fact that samples are selected from a wider population. We show this creates difficulties for counterfactual fairness as well as for the application of more general causal fairness methods.

Towards Robust Stacked Capsule Autoencoder with Hybrid Adversarial Training arxiv:2202.13755 📈 3

Jiazhu Dai, Siwei Xiong

**Abstract:** Capsule networks (CapsNets) are new neural networks that classify images based on the spatial relationships of features. By analyzing the pose of features and their relative positions, it is more capable to recognize images after affine transformation. The stacked capsule autoencoder (SCAE) is a state-of-the-art CapsNet, and achieved unsupervised classification of CapsNets for the first time. However, the security vulnerabilities and the robustness of the SCAE has rarely been explored. In this paper, we propose an evasion attack against SCAE, where the attacker can generate adversarial perturbations based on reducing the contribution of the object capsules in SCAE related to the original category of the image. The adversarial perturbations are then applied to the original images, and the perturbed images will be misclassified. Furthermore, we propose a defense method called Hybrid Adversarial Training (HAT) against such evasion attacks. HAT makes use of adversarial training and adversarial distillation to achieve better robustness and stability. We evaluate the defense method and the experimental results show that the refined SCAE model can achieve 82.14% classification accuracy under evasion attack. The source code is available at https://github.com/FrostbiteXSW/SCAE_Defense.

Enhance transferability of adversarial examples with model architecture arxiv:2202.13625 📈 3

Mingyuan Fan, Wenzhong Guo, Shengxing Yu, Zuobin Ying, Ximeng Liu

**Abstract:** Transferability of adversarial examples is of critical importance to launch black-box adversarial attacks, where attackers are only allowed to access the output of the target model. However, under such a challenging but practical setting, the crafted adversarial examples are always prone to overfitting to the proxy model employed, presenting poor transferability. In this paper, we suggest alleviating the overfitting issue from a novel perspective, i.e., designing a fitted model architecture. Specifically, delving the bottom of the cause of poor transferability, we arguably decompose and reconstruct the existing model architecture into an effective model architecture, namely multi-track model architecture (MMA). The adversarial examples crafted on the MMA can maximumly relieve the effect of model-specified features to it and toward the vulnerable directions adopted by diverse architectures. Extensive experimental evaluation demonstrates that the transferability of adversarial examples based on the MMA significantly surpass other state-of-the-art model architectures by up to 40% with comparable overhead.

Asynchronous Decentralized Federated Learning for Collaborative Fault Diagnosis of PV Stations arxiv:2202.13606 📈 3

Qi Liu, Bo Yang, Zhaojian Wang, Dafeng Zhu, Xinyi Wang, Kai Ma, Xinping Guan

**Abstract:** Due to the different losses caused by various photovoltaic (PV) array faults, accurate diagnosis of fault types is becoming increasingly important. Compared with a single one, multiple PV stations collect sufficient fault samples, but their data is not allowed to be shared directly due to potential conflicts of interest. Therefore, federated learning can be exploited to train a collaborative fault diagnosis model. However, the modeling efficiency is seriously affected by the model update mechanism since each PV station has a different computing capability and amount of data. Moreover, for the safe and stable operation of the PV system, the robustness of collaborative modeling must be guaranteed rather than simply being processed on a central server. To address these challenges, a novel asynchronous decentralized federated learning (ADFL) framework is proposed. Each PV station not only trains its local model but also participates in collaborative fault diagnosis by exchanging model parameters to improve the generalization without losing accuracy. The global model is aggregated distributedly to avoid central node failure. By designing the asynchronous update scheme, the communication overhead and training time are greatly reduced. Both the experiments and numerical simulations are carried out to verify the effectiveness of the proposed method.

Unsupervised Representation Learning for Point Clouds: A Survey arxiv:2202.13589 📈 3

Aoran Xiao, Jiaxing Huang, Dayan Guan, Shijian Lu

**Abstract:** Point cloud data have been widely explored due to its superior accuracy and robustness under various adverse situations. Meanwhile, deep neural networks (DNNs) have achieved very impressive success in various applications such as surveillance and autonomous driving. The convergence of point cloud and DNNs has led to many deep point cloud models, largely trained under the supervision of large-scale and densely-labelled point cloud data. Unsupervised point cloud representation learning, which aims to learn general and useful point cloud representations from unlabelled point cloud data, has recently attracted increasing attention due to the constraint in large-scale point cloud labelling. This paper provides a comprehensive review of unsupervised point cloud representation learning using DNNs. It first describes the motivation, general pipelines as well as terminologies of the recent studies. Relevant background including widely adopted point cloud datasets and DNN architectures is then briefly presented. This is followed by an extensive discussion of existing unsupervised point cloud representation learning methods according to their technical approaches. We also quantitatively benchmark and discuss the reviewed methods over multiple widely adopted point cloud datasets. Finally, we share our humble opinion about several challenges and problems that could be pursued in the future research in unsupervised point cloud representation learning. A project associated with this survey has been built at https://github.com/xiaoaoran/3d_url_survey.

KL Divergence Estimation with Multi-group Attribution arxiv:2202.13576 📈 3

Parikshit Gopalan, Nina Narodytska, Omer Reingold, Vatsal Sharan, Udi Wieder

**Abstract:** Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory. Motivated by considerations of multi-group fairness, we seek KL divergence estimates that accurately reflect the contributions of sub-populations to the overall divergence. We model the sub-populations coming from a rich (possibly infinite) family $\mathcal{C}$ of overlapping subsets of the domain. We propose the notion of multi-group attribution for $\mathcal{C}$, which requires that the estimated divergence conditioned on every sub-population in $\mathcal{C}$ satisfies some natural accuracy and fairness desiderata, such as ensuring that sub-populations where the model predicts significant divergence do diverge significantly in the two distributions. Our main technical contribution is to show that multi-group attribution can be derived from the recently introduced notion of multi-calibration for importance weights [HKRR18, GRSW21]. We provide experimental evidence to support our theoretical results, and show that multi-group attribution provides better KL divergence estimates when conditioned on sub-populations than other popular algorithms.

Filter-enhanced MLP is All You Need for Sequential Recommendation arxiv:2202.13556 📈 3

Kun Zhou, Hui Yu, Wayne Xin Zhao, Ji-Rong Wen

**Abstract:** Recently, deep neural networks such as RNN, CNN and Transformer have been applied in the task of sequential recommendation, which aims to capture the dynamic preference characteristics from logged user behavior data for accurate recommendation. However, in online platforms, logged user behavior data is inevitable to contain noise, and deep recommendation models are easy to overfit on these logged data. To tackle this problem, we borrow the idea of filtering algorithms from signal processing that attenuates the noise in the frequency domain. In our empirical experiments, we find that filtering algorithms can substantially improve representative sequential recommendation models, and integrating simple filtering algorithms (eg Band-Stop Filter) with an all-MLP architecture can even outperform competitive Transformer-based models. Motivated by it, we propose \textbf{FMLP-Rec}, an all-MLP model with learnable filters for sequential recommendation task. The all-MLP architecture endows our model with lower time complexity, and the learnable filters can adaptively attenuate the noise information in the frequency domain. Extensive experiments conducted on eight real-world datasets demonstrate the superiority of our proposed method over competitive RNN, CNN, GNN and Transformer-based methods. Our code and data are publicly available at the link: \textcolor{blue}{\url{https://github.com/RUCAIBox/FMLP-Rec}}.

Gait Events Prediction using Hybrid CNN-RNN-based Deep Learning models through a Single Waist-worn Wearable Sensor arxiv:2203.00503 📈 2

Muhammad Zeeshan Arshad, Ankhzaya Jamsrandorj, Jinwook Kim, Kyung-Ryoul Mun

**Abstract:** Elderly gait is a source of rich information about their physical and mental health condition. As an alternative to the multiple sensors on the lower body parts, a single sensor on the pelvis has a positional advantage and an abundance of information acquirable. This study aimed to explore a way of improving the accuracy of gait event detection in the elderly using a single sensor on the waist and deep learning models. Data was gathered from elderly subjects equipped with three IMU sensors while they walked. The input was taken only from the waist sensor was used to train 16 deep-learning models including CNN, RNN, and CNN-RNN hybrid with or without the Bidirectional and Attention mechanism. The groundtruth was extracted from foot IMU sensors. Fairly high accuracy of 99.73% and 93.89% was achieved by the CNN-BiGRU-Att model at the tolerance window of $\pm$6TS ($\pm$6ms) and $\pm$1TS ($\pm$1ms) respectively. Advancing from the previous studies exploring gait event detection, the model showed a great improvement in terms of its prediction error having an MAE of 6.239ms and 5.24ms for HS and TO events respectively at the tolerance window of $\pm$1TS. The results showed that the use of CNN-RNN hybrid models with Attention and Bidirectional mechanisms is promising for accurate gait event detection using a single waist sensor. The study can contribute to reducing the burden of gait detection and increase its applicability in future wearable devices that can be used for remote health monitoring (RHM) or diagnosis based thereon.

Disentangled Spatiotemporal Graph Generative Models arxiv:2203.00411 📈 2

Yuanqi Du, Xiaojie Guo, Hengning Cao, Yanfang Ye, Liang Zhao

**Abstract:** Spatiotemporal graph represents a crucial data structure where the nodes and edges are embedded in a geometric space and can evolve dynamically over time. Nowadays, spatiotemporal graph data is becoming increasingly popular and important, ranging from microscale (e.g. protein folding), to middle-scale (e.g. dynamic functional connectivity), to macro-scale (e.g. human mobility network). Although disentangling and understanding the correlations among spatial, temporal, and graph aspects have been a long-standing key topic in network science, they typically rely on network processing hypothesized by human knowledge. This usually fit well towards the graph properties which can be predefined, but cannot do well for the most cases, especially for many key domains where the human has yet very limited knowledge such as protein folding and biological neuronal networks. In this paper, we aim at pushing forward the modeling and understanding of spatiotemporal graphs via new disentangled deep generative models. Specifically, a new Bayesian model is proposed that factorizes spatiotemporal graphs into spatial, temporal, and graph factors as well as the factors that explain the interplay among them. A variational objective function and new mutual information thresholding algorithms driven by information bottleneck theory have been proposed to maximize the disentanglement among the factors with theoretical guarantees. Qualitative and quantitative experiments on both synthetic and real-world datasets demonstrate the superiority of the proposed model over the state-of-the-arts by up to 69.2% for graph generation and 41.5% for interpretability.

FedREP: Towards Horizontal Federated Load Forecasting for Retail Energy Providers arxiv:2203.00219 📈 2

Muhammad Akbar Husnoo, Adnan Anwar, Nasser Hosseinzadeh, Shama Naz Islam, Abdun Naser Mahmood, Robin Doss

**Abstract:** As Smart Meters are collecting and transmitting household energy consumption data to Retail Energy Providers (REP), the main challenge is to ensure the effective use of fine-grained consumer data while ensuring data privacy. In this manuscript, we tackle this challenge for energy load consumption forecasting in regards to REPs which is essential to energy demand management, load switching and infrastructure development. Specifically, we note that existing energy load forecasting is centralized, which are not scalable and most importantly, vulnerable to data privacy threats. Besides, REPs are individual market participants and liable to ensure the privacy of their own customers. To address this issue, we propose a novel horizontal privacy-preserving federated learning framework for REPs energy load forecasting, namely FedREP. We consider a federated learning system consisting of a control centre and multiple retailers by enabling multiple REPs to build a common, robust machine learning model without sharing data, thus addressing critical issues such as data privacy, data security and scalability. For forecasting, we use a state-of-the-art Long Short-Term Memory (LSTM) neural network due to its ability to learn long term sequences of observations and promises of higher accuracy with time-series data while solving the vanishing gradient problem. Finally, we conduct extensive data-driven experiments using a real energy consumption dataset. Experimental results demonstrate that our proposed federated learning framework can achieve sufficient performance in terms of MSE ranging between 0.3 to 0.4 and is relatively similar to that of a centralized approach while preserving privacy and improving scalability.

EPPAC: Entity Pre-typing Relation Classification with Prompt AnswerCentralizing arxiv:2203.00193 📈 2

Jiejun Tan, Wenbin Hu, WeiWei Liu

**Abstract:** Relation classification (RC) aims to predict the relationship between a pair of subject and object in a given context. Recently, prompt tuning approaches have achieved high performance in RC. However, existing prompt tuning approaches have the following issues: (1) numerous categories decrease RC performance; (2) manually designed prompts require intensive labor. To address these issues, a novel paradigm, Entity Pre-typing Relation Classification with Prompt Answer Centralizing(EPPAC) is proposed in this paper. The entity pre-tying in EPPAC is presented to address the first issue using a double-level framework that pre-types entities before RC and prompt answer centralizing is proposed to address the second issue. Extensive experiments show that our proposed EPPAC outperformed state-of-the-art approaches on TACRED and TACREV by 14.4% and 11.1%, respectively. The code is provided in the Supplementary Materials.

GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks arxiv:2203.00158 📈 2

Minhoo Kang, Ranggi Hwang, Jiwon Lee, Dongyun Kam, Youngjoo Lee, Minsoo Rhu

**Abstract:** Graph convolutional neural networks (GCNs) have emerged as a key technology in various application domains where the input data is relational. A unique property of GCNs is that its two primary execution stages, aggregation and combination, exhibit drastically different dataflows. Consequently, prior GCN accelerators tackle this research space by casting the aggregation and combination stages as a series of sparse-dense matrix multiplication. However, prior work frequently suffers from inefficient data movements, leaving significant performance left on the table. We present GROW, a GCN accelerator based on Gustavson's algorithm to architect a row-wise product based sparse-dense GEMM accelerator. GROW co-designs the software/hardware that strikes a balance in locality and parallelism for GCNs, achieving significant energy-efficiency improvements vs. state-of-the-art GCN accelerators.

Molecular Dynamics of Polymer-lipids in Solution from Supervised Machine Learning arxiv:2203.00151 📈 2

James Andrews, Olga Gkountouna, Estela Blaisten-Barojas

**Abstract:** Machine learning techniques including neural networks are popular tools for materials and chemical scientists with applications that may provide viable alternative methods in the analysis of structure and energetics of systems ranging from crystals to biomolecules. However, efforts are less abundant for prediction of dynamics. Here we explore the ability of three well established recurrent neural network architectures for forecasting the energetics of a macromolecular polymer-lipid aggregate solvated in ethyl acetate at ambient conditions. Data models generated from recurrent neural networks are trained and tested on nanoseconds-long time series of the intra-macromolecules potential energy and their interaction energy with the solvent generated from Molecular Dynamics and containing half million points. Our exhaustive analyses convey that the three recurrent neural network investigated generate data models with limited capability of reproducing the energetic fluctuations and yielding short or long term energetics forecasts with underlying distribution of points inconsistent with the input series distributions. We propose an in silico experimental protocol consisting on forming an ensemble of artificial network models trained on an ensemble of series with additional features from time series containing pre-clustered time patterns of the original series. The forecast process improves by predicting a band of forecasted time series with a spread of values consistent with the molecular dynamics energy fluctuations span. However, the distribution of points from the band of forecasts is not optimal. Although the three inspected recurrent neural networks were unable of generating single models that reproduce the actual fluctuations of the inspected molecular system energies in thermal equilibrium at the nanosecond scale, the proposed protocol provides useful estimates of the molecular fate

Explaining RADAR features for detecting spoofing attacks in Connected Autonomous Vehicles arxiv:2203.00150 📈 2

Nidhi Rastogi, Sara Rampazzi, Michael Clifford, Miriam Heller, Matthew Bishop, Karl Levitt

**Abstract:** Connected autonomous vehicles (CAVs) are anticipated to have built-in AI systems for defending against cyberattacks. Machine learning (ML) models form the basis of many such AI systems. These models are notorious for acting like black boxes, transforming inputs into solutions with great accuracy, but no explanations support their decisions. Explanations are needed to communicate model performance, make decisions transparent, and establish trust in the models with stakeholders. Explanations can also indicate when humans must take control, for instance, when the ML model makes low confidence decisions or offers multiple or ambiguous alternatives. Explanations also provide evidence for post-incident forensic analysis. Research on explainable ML to security problems is limited, and more so concerning CAVs. This paper surfaces a critical yet under-researched sensor data \textit{uncertainty} problem for training ML attack detection models, especially in highly mobile and risk-averse platforms such as autonomous vehicles. We present a model that explains \textit{certainty} and \textit{uncertainty} in sensor input -- a missing characteristic in data collection. We hypothesize that model explanation is inaccurate for a given system without explainable input data quality. We estimate \textit{uncertainty} and mass functions for features in radar sensor data and incorporate them into the training model through experimental evaluation. The mass function allows the classifier to categorize all spoofed inputs accurately with an incorrect class label.

Investigating the Spatiotemporal Charging Demand and Travel Behavior of Electric Vehicles Using GPS Data: A Machine Learning Approach arxiv:2203.00135 📈 2

Sina Baghali, Zhaomiao Guo, Samiul Hasan

**Abstract:** The increasing market penetration of electric vehicles (EVs) may change the travel behavior of drivers and pose a significant electricity demand on the power system. Since the electricity demand depends on the travel behavior of EVs, which are inherently uncertain, the forecasting of daily charging demand (CD) will be a challenging task. In this paper, we use the recorded GPS data of EVs and conventional gasoline-powered vehicles from the same city to investigate the potential shift in the travel behavior of drivers from conventional vehicles to EVs and forecast the spatiotemporal patterns of daily CD. Our analysis reveals that the travel behavior of EVs and conventional vehicles are similar. Also, the forecasting results indicate that the developed models can generate accurate spatiotemporal patterns of the daily CD.

BlazeNeo: Blazing fast polyp segmentation and neoplasm detection arxiv:2203.00129 📈 2

Nguyen Sy An, Phan Ngoc Lan, Dao Viet Hang, Dao Van Long, Tran Quang Trung, Nguyen Thi Thuy, Dinh Viet Sang

**Abstract:** In recent years, computer-aided automatic polyp segmentation and neoplasm detection have been an emerging topic in medical image analysis, providing valuable support to colonoscopy procedures. Attentions have been paid to improving the accuracy of polyp detection and segmentation. However, not much focus has been given to latency and throughput for performing these tasks on dedicated devices, which can be crucial for practical applications. This paper introduces a novel deep neural network architecture called BlazeNeo, for the task of polyp segmentation and neoplasm detection with an emphasis on compactness and speed while maintaining high accuracy. The model leverages the highly efficient HarDNet backbone alongside lightweight Receptive Field Blocks for computational efficiency, and an auxiliary training mechanism to take full advantage of the training data for the segmentation quality. Our experiments on a challenging dataset show that BlazeNeo achieves improvements in latency and model size while maintaining comparable accuracy against state-of-the-art methods. When deploying on the Jetson AGX Xavier edge device in INT8 precision, our BlazeNeo achieves over 155 fps while yielding the best accuracy among all compared methods.

ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction arxiv:2203.00101 📈 2

Hossein Keshavarz, Meiyappan Nagappan

**Abstract:** In this paper, we present ApacheJIT, a large dataset for Just-In-Time defect prediction. ApacheJIT consists of clean and bug-inducing software changes in popular Apache projects. ApacheJIT has a total of 106,674 commits (28,239 bug-inducing and 78,435 clean commits). Having a large number of commits makes ApacheJIT a suitable dataset for machine learning models, especially deep learning models that require large training sets to effectively generalize the patterns present in the historical data to future data. In addition to the original dataset, we also present carefully selected training and test sets that we recommend to be used in training and evaluating machine learning models.

Amortized Proximal Optimization arxiv:2203.00089 📈 2

Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger Grosse

**Abstract:** We propose a framework for online meta-optimization of parameters that govern optimization, called Amortized Proximal Optimization (APO). We first interpret various existing neural network optimizers as approximate stochastic proximal point methods which trade off the current-batch loss with proximity terms in both function space and weight space. The idea behind APO is to amortize the minimization of the proximal point objective by meta-learning the parameters of an update rule. We show how APO can be used to adapt a learning rate or a structured preconditioning matrix. Under appropriate assumptions, APO can recover existing optimizers such as natural gradient descent and KFAC. It enjoys low computational overhead and avoids expensive and numerically sensitive operations required by some second-order optimizers, such as matrix inverses. We empirically test APO for online adaptation of learning rates and structured preconditioning matrices for regression, image reconstruction, image classification, and natural language translation tasks. Empirically, the learning rate schedules found by APO generally outperform optimal fixed learning rates and are competitive with manually tuned decay schedules. Using APO to adapt a structured preconditioning matrix generally results in optimization performance competitive with second-order methods. Moreover, the absence of matrix inversion provides numerical stability, making it effective for low precision training.

Robust Multi-Agent Bandits Over Undirected Graphs arxiv:2203.00076 📈 2

Daniel Vial, Sanjay Shakkottai, R. Srikant

**Abstract:** We consider a multi-agent multi-armed bandit setting in which $n$ honest agents collaborate over a network to minimize regret but $m$ malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur $O( (m + K/n) \log (T) / Δ)$ regret in this setting, where $K$ is the number of arms and $Δ$ is the arm gap. For $m \ll K$, this improves over the single-agent baseline regret of $O(K\log(T)/Δ)$. In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in $K$ and $n$. In light of this negative result, we propose a new algorithm for which the $i$-th agent has regret $O( ( d_{\text{mal}}(i) + K/n) \log(T)/Δ)$ on any connected and undirected graph, where $d_{\text{mal}}(i)$ is the number of $i$'s neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where $d_{\text{mal}}(i) = m$), and show the effect of malicious agents is entirely local (in the sense that only the $d_{\text{mal}}(i)$ malicious agents directly connected to $i$ affect its long-term regret).

Proceedings of the Artificial Intelligence for Cyber Security (AICS) Workshop at AAAI 2022 arxiv:2202.14010 📈 2

James Holt, Edward Raff, Ahmad Ridley, Dennis Ross, Arunesh Sinha, Diane Staheli, William Streilen, Milind Tambe, Yevgeniy Vorobeychik, Allan Wollaber

**Abstract:** The workshop will focus on the application of AI to problems in cyber security. Cyber systems generate large volumes of data, utilizing this effectively is beyond human capabilities. Additionally, adversaries continue to develop new attacks. Hence, AI methods are required to understand and protect the cyber domain. These challenges are widely studied in enterprise networks, but there are many gaps in research and practice as well as novel problems in other domains. In general, AI techniques are still not widely adopted in the real world. Reasons include: (1) a lack of certification of AI for security, (2) a lack of formal study of the implications of practical constraints (e.g., power, memory, storage) for AI systems in the cyber domain, (3) known vulnerabilities such as evasion, poisoning attacks, (4) lack of meaningful explanations for security analysts, and (5) lack of analyst trust in AI solutions. There is a need for the research community to develop novel solutions for these practical issues.

Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation arxiv:2202.13863 📈 2

Jing Dong, Li Shen, Yinggan Xu, Baoxiang Wang

**Abstract:** We study the convergence of the actor-critic algorithm with nonlinear function approximation under a nonconvex-nonconcave primal-dual formulation. Stochastic gradient descent ascent is applied with an adaptive proximal term for robust learning rates. We show the first efficient convergence result with primal-dual actor-critic with a convergence rate of $\mathcal{O}\left(\sqrt{\frac{\ln \left(N d G^2 \right)}{N}}\right)$ under Markovian sampling, where $G$ is the element-wise maximum of the gradient, $N$ is the number of iterations, and $d$ is the dimension of the gradient. Our result is presented with only the Polyak-Łojasiewicz condition for the dual variables, which is easy to verify and applicable to a wide range of reinforcement learning (RL) scenarios. The algorithm and analysis are general enough to be applied to other RL settings, like multi-agent RL. Empirical results on OpenAI Gym continuous control tasks corroborate our theoretical findings.

Severity classification in cases of Collagen VI-related myopathy with Convolutional Neural Networks and handcrafted texture features arxiv:2202.13853 📈 2

Rafael Rodrigues, Susana Quijano-Roy, Robert-Yves Carlier, Antonio M. G. Pinheiro

**Abstract:** Magnetic Resonance Imaging (MRI) is a non-invasive tool for the clinical assessment of low-prevalence neuromuscular disorders. Automated diagnosis methods might reduce the need for biopsies and provide valuable information on disease follow-up. In this paper, three methods are proposed to classify target muscles in Collagen VI-related myopathy cases, based on their degree of involvement, notably a Convolutional Neural Network, a Fully Connected Network to classify texture features, and a hybrid method combining the two feature sets. The proposed methods was evaluated on axial T1-weighted Turbo Spin-Echo MRI from 26 subjects, including Ullrich Congenital Muscular Dystrophy or Bethlem Myopathy patients at different evolution stages. The best results were obtained with the hybrid model, resulting in a global accuracy of 93.8\%, and F-scores of 0.99, 0.82, and 0.95, for healthy, mild and moderate/severe cases, respectively.

Magnitude-aware Probabilistic Speaker Embeddings arxiv:2202.13826 📈 2

Nikita Kuzmin, Igor Fedorov, Alexey Sholokhov

**Abstract:** Recently, hyperspherical embeddings have established themselves as a dominant technique for face and voice recognition. Specifically, Euclidean space vector embeddings are learned to encode person-specific information in their direction while ignoring the magnitude. However, recent studies have shown that the magnitudes of the embeddings extracted by deep neural networks may indicate the quality of the corresponding inputs. This paper explores the properties of the magnitudes of the embeddings related to quality assessment and out-of-distribution detection. We propose a new probabilistic speaker embedding extractor using the information encoded in the embedding magnitude and leverage it in the speaker verification pipeline. We also propose several quality-aware diarization methods and incorporate the magnitudes in those. Our results indicate significant improvements over magnitude-agnostic baselines both in speaker verification and diarization tasks.

TraceNet: Tracing and Locating the Key Elements in Sentiment Analysis arxiv:2202.13812 📈 2

Qinghua Zhao, Shuai Ma

**Abstract:** In this paper, we study sentiment analysis task where the outcomes are mainly contributed by a few key elements of the inputs. Motivated by the two-streams hypothesis, we propose a neural architecture, named TraceNet, to address this type of task. It not only learns discriminative representations for the target task via its encoders, but also traces key elements at the same time via its locators. In TraceNet, both encoders and locators are organized in a layer-wise manner, and a smoothness regularization is employed between adjacent encoder-locator combinations. Moreover, a sparsity constraints are enforced on locators for tracing purposes and items are proactively masked according to the item weights output by locators.A major advantage of TraceNet is that the outcomes are easier to understand, since the most responsible parts of inputs are identified. Also, under the guidance of locators, it is more robust to attacks due to its focus on key elements and the proactive masking training strategy. Experimental results show its effectiveness for sentiment classification. Moreover, we provide several case studies to demonstrate its robustness and interpretability.

Estimating Model Performance on External Samples from Their Limited Statistical Characteristics arxiv:2202.13683 📈 2

Tal El-Hay, Chen Yanover

**Abstract:** Methods that address data shifts usually assume full access to multiple datasets. In the healthcare domain, however, privacy-preserving regulations as well as commercial interests limit data availability and, as a result, researchers can typically study only a small number of datasets. In contrast, limited statistical characteristics of specific patient samples are much easier to share and may be available from previously published literature or focused collaborative efforts. Here, we propose a method that estimates model performance in external samples from their limited statistical characteristics. We search for weights that induce internal statistics that are similar to the external ones; and that are closest to uniform. We then use model performance on the weighted internal sample as an estimation for the external counterpart. We evaluate the proposed algorithm on simulated data as well as electronic medical record data for two risk models, predicting complications in ulcerative colitis patients and stroke in women diagnosed with atrial fibrillation. In the vast majority of cases, the estimated external performance is much closer to the actual one than the internal performance. Our proposed method may be an important building block in training robust models and detecting potential model failures in external environments.

GPU-Accelerated Policy Optimization via Batch Automatic Differentiation of Gaussian Processes for Real-World Control arxiv:2202.13638 📈 2

Abdolreza Taheri, Joni Pajarinen, Reza Ghabcheloo

**Abstract:** The ability of Gaussian processes (GPs) to predict the behavior of dynamical systems as a more sample-efficient alternative to parametric models seems promising for real-world robotics research. However, the computational complexity of GPs has made policy search a highly time and memory consuming process that has not been able to scale to larger problems. In this work, we develop a policy optimization method by leveraging fast predictive sampling methods to process batches of trajectories in every forward pass, and compute gradient updates over policy parameters by automatic differentiation of Monte Carlo evaluations, all on GPU. We demonstrate the effectiveness of our approach in training policies on a set of reference-tracking control experiments with a heavy-duty machine. Benchmark results show a significant speedup over exact methods and showcase the scalability of our method to larger policy networks, longer horizons, and up to thousands of trajectories with a sublinear drop in speed.

Deep learning enhanced Rydberg multifrequency microwave recognition arxiv:2202.13617 📈 2

Zong-Kai Liu, Li-Hua Zhang, Bang Liu, Zheng-Yuan Zhang, Guang-Can Guo, Dong-Sheng Ding, Bao-Sen Shi

**Abstract:** Recognition of multifrequency microwave (MW) electric fields is challenging because of the complex interference of multifrequency fields in practical applications. Rydberg atom-based measurements for multifrequency MW electric fields is promising in MW radar and MW communications. However, Rydberg atoms are sensitive not only to the MW signal but also to noise from atomic collisions and the environment, meaning that solution of the governing Lindblad master equation of light-atom interactions is complicated by the inclusion of noise and high-order terms. Here, we solve these problems by combining Rydberg atoms with deep learning model, demonstrating that this model uses the sensitivity of the Rydberg atoms while also reducing the impact of noise without solving the master equation. As a proof-of-principle demonstration, the deep learning enhanced Rydberg receiver allows direct decoding of the frequency-division multiplexed (FDM) signal. This type of sensing technology is expected to benefit Rydberg-based MW fields sensing and communication.

Semi-supervised Learning on Large Graphs: is Poisson Learning a Game-Changer? arxiv:2202.13608 📈 2

Canh Hao Nguyen

**Abstract:** We explain Poisson learning on graph-based semi-supervised learning to see if it could avoid the problem of global information loss problem as Laplace-based learning methods on large graphs. From our analysis, Poisson learning is simply Laplace regularization with thresholding, cannot overcome the problem.

Generalizable task representation learning from human demonstration videos: a geometric approach arxiv:2202.13604 📈 2

Jun Jin, Martin Jagersand

**Abstract:** We study the problem of generalizable task learning from human demonstration videos without extra training on the robot or pre-recorded robot motions. Given a set of human demonstration videos showing a task with different objects/tools (categorical objects), we aim to learn a representation of visual observation that generalizes to categorical objects and enables efficient controller design. We propose to introduce a geometric task structure to the representation learning problem that geometrically encodes the task specification from human demonstration videos, and that enables generalization by building task specification correspondence between categorical objects. Specifically, we propose CoVGS-IL, which uses a graph-structured task function to learn task representations under structural constraints. Our method enables task generalization by selecting geometric features from different objects whose inner connection relationships define the same task in geometric constraints. The learned task representation is then transferred to a robot controller using uncalibrated visual servoing (UVS); thus, the need for extra robot training or pre-recorded robot motions is removed.

LCP-dropout: Compression-based Multiple Subword Segmentation for Neural Machine Translation arxiv:2202.13590 📈 2

Keita Nonaka, Kazutaka Yamanouchi, Tomohiro I, Tsuyoshi Okita, Kazutaka Shimada, Hiroshi Sakamoto

**Abstract:** In this study, we propose a simple and effective preprocessing method for subword segmentation based on a data compression algorithm. Compression-based subword segmentation has recently attracted significant attention as a preprocessing method for training data in Neural Machine Translation. Among them, BPE/BPE-dropout is one of the fastest and most effective method compared to conventional approaches. However, compression-based approach has a drawback in that generating multiple segmentations is difficult due to the determinism. To overcome this difficulty, we focus on a probabilistic string algorithm, called locally-consistent parsing (LCP), that has been applied to achieve optimum compression. Employing the probabilistic mechanism of LCP, we propose LCP-dropout for multiple subword segmentation that improves BPE/BPE-dropout, and show that it outperforms various baselines in learning from especially small training data.

Using Multi-scale SwinTransformer-HTC with Data augmentation in CoNIC Challenge arxiv:2202.13588 📈 2

Chia-Yen Lee, Hsiang-Chin Chien, Ching-Ping Wang, Hong Yen, Kai-Wen Zhen, Hong-Kun Lin

**Abstract:** Colorectal cancer is one of the most common cancers worldwide, so early pathological examination is very important. However, it is time-consuming and labor-intensive to identify the number and type of cells on H&E images in clinical. Therefore, automatic segmentation and classification task and counting the cellular composition of H&E images from pathological sections is proposed by CoNIC Challenge 2022. We proposed a multi-scale Swin transformer with HTC for this challenge, and also applied the known normalization methods to generate more augmentation data. Finally, our strategy showed that the multi-scale played a crucial role to identify different scale features and the augmentation arose the recognition of model.

Rethinking and Refining the Distinct Metric arxiv:2202.13587 📈 2

Siyang Liu, Sahand Sabour, Yinhe Zheng, Pei Ke, Xiaoyan Zhu, Minlie Huang

**Abstract:** Distinct is a widely used automatic metric for evaluating the diversity of language generation tasks. However, we observe that the original approach to calculating distinct scores has evident biases that tend to add higher penalties to longer sequences. In this paper, we refine the calculation of distinct scores by re-scaling the number of distinct tokens based on its expectation. We provide both empirical and theoretical evidence to show that our method effectively removes the biases exhibited in the original distinct score. Further analyses also demonstrate that the refined score correlates better with human evaluations.

ConvNeXt-backbone HoVerNet for nuclei segmentation and classification arxiv:2202.13560 📈 2

Jiachen Li, Chixin Wang, Banban Huang, Zekun Zhou

**Abstract:** This manuscript gives a brief description of the algorithm used to participate in CoNIC Challenge 2022. We first try out Deeplab-v3+ and Swin-Transformer for semantic segmentation. After the baseline was made available, we follow the method in it and replace the ResNet baseline with ConvNeXtone. Results on validation set shows that even with channel ofeach stage significant smaller in number, it still improves the mPQ+ by 0.04 and multi r2 by 0.0144.

Semi-supervised Nonnegative Matrix Factorization for Document Classification arxiv:2203.03551 📈 1

Jamie Haddock, Lara Kassab, Sixian Li, Alona Kryshchenko, Rachel Grotheer, Elena Sizikova, Chuntian Wang, Thomas Merkh, RWMA Madushani, Miju Ahn, Deanna Needell, Kathryn Leonard

**Abstract:** We propose new semi-supervised nonnegative matrix factorization (SSNMF) models for document classification and provide motivation for these models as maximum likelihood estimators. The proposed SSNMF models simultaneously provide both a topic model and a model for classification, thereby offering highly interpretable classification results. We derive training methods using multiplicative updates for each new model, and demonstrate the application of these models to single-label and multi-label document classification, although the models are flexible to other supervised learning tasks such as regression. We illustrate the promise of these models and training methods on document classification datasets (e.g., 20 Newsgroups, Reuters).

Analysis of Digitalized ECG Signals Based on Artificial Intelligence and Spectral Analysis Methods Specialized in ARVC arxiv:2203.00504 📈 1

Vasileios E. Papageorgiou, Thomas Zegkos, Georgios Efthimiadis, George Tsaklidis

**Abstract:** Arrhythmogenic right ventricular cardiomyopathy (ARVC) is an inherited heart muscle disease that appears between the second and forth decade of a patient's life, being responsible for 20% of sudden cardiac deaths before the age of 35. The effective and punctual diagnosis of this disease based on Electrocardiograms (ECGs) could have a vital role in reducing premature cardiovascular mortality. In our analysis, we firstly outline the digitalization process of paper-based ECG signals enhanced by a spatial filter aiming to eliminate dark regions in the dataset's images that do not correspond to ECG waveform, producing undesirable noise. Next, we propose the utilization of a low-complexity convolutional neural network for the detection of an arrhythmogenic heart disease, that has not been studied through the usage of deep learning methodology to date, achieving high classification accuracy on a disease the major identification criterion of which are infinitesimal millivolt variations in the ECG's morphology, in contrast with other arrhythmogenic abnormalities. Finally, by performing spectral analysis we investigate significant differentiations in the field of frequencies between normal ECGs and ECGs corresponding to patients suffering from ARVC. The overall research carried out in this article highlights the importance of integrating mathematical methods into the examination and effective diagnosis of various diseases, aiming to a substantial contribution to their successful treatment.

On the sample complexity of stabilizing linear dynamical systems from data arxiv:2203.00474 📈 1

Steffen W. R. Werner, Benjamin Peherstorfer

**Abstract:** Learning controllers from data for stabilizing dynamical systems typically follows a two step process of first identifying a model and then constructing a controller based on the identified model. However, learning models means identifying generic descriptions of the dynamics of systems, which can require large amounts of data and extracting information that are unnecessary for the specific task of stabilization. The contribution of this work is to show that if a linear dynamical system has dimension (McMillan degree) $n$, then there always exist $n$ states from which a stabilizing feedback controller can be constructed, independent of the dimension of the representation of the observed states and the number of inputs. By building on previous work, this finding implies that any linear dynamical system can be stabilized from fewer observed states than the minimal number of states required for learning a model of the dynamics. The theoretical findings are demonstrated with numerical experiments that show the stabilization of the flow behind a cylinder from less data than necessary for learning a model.

NeuRecover: Regression-Controlled Repair of Deep Neural Networks with Training History arxiv:2203.00191 📈 1

Shogo Tokui, Susumu Tokumoto, Akihito Yoshii, Fuyuki Ishikawa, Takao Nakagawa, Kazuki Munakata, Shinji Kikuchi

**Abstract:** Systematic techniques to improve quality of deep neural networks (DNNs) are critical given the increasing demand for practical applications including safety-critical ones. The key challenge comes from the little controllability in updating DNNs. Retraining to fix some behavior often has a destructive impact on other behavior, causing regressions, i.e., the updated DNN fails with inputs correctly handled by the original one. This problem is crucial when engineers are required to investigate failures in intensive assurance activities for safety or trust. Search-based repair techniques for DNNs have potentials to tackle this challenge by enabling localized updates only on "responsible parameters" inside the DNN. However, the potentials have not been explored to realize sufficient controllability to suppress regressions in DNN repair tasks. In this paper, we propose a novel DNN repair method that makes use of the training history for judging which DNN parameters should be changed or not to suppress regressions. We implemented the method into a tool called NeuRecover and evaluated it with three datasets. Our method outperformed the existing method by achieving often less than a quarter, even a tenth in some cases, number of regressions. Our method is especially effective when the repair requirements are tight to fix specific failure types. In such cases, our method showed stably low rates (<2%) of regressions, which were in many cases a tenth of regressions caused by retraining.

When AUC meets DRO: Optimizing Partial AUC for Deep Learning with Non-Convex Convergence Guarantee arxiv:2203.00176 📈 1

Dixian Zhu, Gang Li, Bokun Wang, Xiaodong Wu, Tianbao Yang

**Abstract:** In this paper, we propose systematic and efficient gradient-based methods for both one-way and two-way partial AUC (pAUC) maximization that are applicable to deep learning. We propose new formulations of pAUC surrogate objectives by using the distributionally robust optimization (DRO) to define the loss for each individual positive data. We consider two formulations of DRO, one of which is based on conditional-value-at-risk (CVaR) that yields a non-smooth but exact estimator for pAUC, and another one is based on a KL divergence regularized DRO that yields an inexact but smooth (soft) estimator for pAUC. For both one-way and two-way pAUC maximization, we propose two algorithms and prove their convergence for optimizing their two formulations, respectively. Experiments demonstrate the effectiveness of the proposed algorithms for pAUC maximization for deep learning on various datasets.

On Testability of the Front-Door Model via Verma Constraints arxiv:2203.00161 📈 1

Rohit Bhattacharya, Razieh Nabi

**Abstract:** The front-door criterion can be used to identify and compute causal effects despite the existence of unmeasured confounders between a treatment and outcome. However, the key assumptions -- (i) the existence of a variable (or set of variables) that fully mediates the effect of the treatment on the outcome, and (ii) which simultaneously does not suffer from similar issues of confounding as the treatment-outcome pair -- are often deemed implausible. This paper explores the testability of these assumptions. We show that under mild conditions involving an auxiliary variable, the assumptions encoded in the front-door model (and simple extensions of it) may be tested via generalized equality constraints a.k.a Verma constraints. We propose two goodness-of-fit tests based on this observation, and evaluate the efficacy of our proposal on real and synthetic data. We also provide theoretical and empirical comparisons to instrumental variable approaches to handling unmeasured confounding.

On Testability and Goodness of Fit Tests in Missing Data Models arxiv:2203.00132 📈 1

Razieh Nabi, Rohit Bhattacharya

**Abstract:** Significant progress has been made in developing identification and estimation techniques for missing data problems where modeling assumptions can be described via a directed acyclic graph. The validity of results using such techniques rely on the assumptions encoded by the graph holding true; however, verification of these assumptions has not received sufficient attention in prior work. In this paper, we provide new insights on the testable implications of three broad classes of missing data graphical models, and design goodness-of-fit tests around them. The classes of models explored are: sequential missing-at-random and missing-not-at-random models which can be used for modeling longitudinal studies with dropout/censoring, and a kind of no self-censoring model which can be applied to cross-sectional studies and surveys.

Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach arxiv:2203.00126 📈 1

Xiucai Ding, Rong Ma

**Abstract:** We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations, where the datasets are assumed to be sampled from an intrinsically low-dimensional manifold and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and size of the samples are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various manifolds in diverse applications.

RouteNet-Erlang: A Graph Neural Network for Network Performance Evaluation arxiv:2202.13956 📈 1

Miquel Ferriol-Galmés, Krzysztof Rusek, José Suárez-Varela, Shihan Xiao, Xiangle Cheng, Pere Barlet-Ros, Albert Cabellos-Aparicio

**Abstract:** Network modeling is a fundamental tool in network research, design, and operation. Arguably the most popular method for modeling is Queuing Theory (QT). Its main limitation is that it imposes strong assumptions on the packet arrival process, which typically do not hold in real networks. In the field of Deep Learning, Graph Neural Networks (GNN) have emerged as a new technique to build data-driven models that can learn complex and non-linear behavior. In this paper, we present \emph{RouteNet-Erlang}, a pioneering GNN architecture designed to model computer networks. RouteNet-Erlang supports complex traffic models, multi-queue scheduling policies, routing policies and can provide accurate estimates in networks not seen in the training phase. We benchmark RouteNet-Erlang against a state-of-the-art QT model, and our results show that it outperforms QT in all the network scenarios.

Numeric Lyndon-based feature embedding of sequencing reads for machine learning approaches arxiv:2202.13884 📈 1

Paola Bonizzoni, Matteo Costantini, Clelia De Felice, Alessia Petescia, Yuri Pirola, Marco Previtali, Raffaella Rizzi, Jens Stoye, Rocco Zaccagnino, Rosalba Zizza

**Abstract:** Feature embedding methods have been proposed in literature to represent sequences as numeric vectors to be used in some bioinformatics investigations, such as family classification and protein structure prediction. Recent theoretical results showed that the well-known Lyndon factorization preserves common factors in overlapping strings. Surprisingly, the fingerprint of a sequencing read, which is the sequence of lengths of consecutive factors in variants of the Lyndon factorization of the read, is effective in preserving sequence similarities, suggesting it as basis for the definition of novels representations of sequencing reads. We propose a novel feature embedding method for Next-Generation Sequencing (NGS) data using the notion of fingerprint. We provide a theoretical and experimental framework to estimate the behaviour of fingerprints and of the k-mers extracted from it, called k-fingers, as possible feature embeddings for sequencing reads. As a case study to assess the effectiveness of such embeddings, we use fingerprints to represent RNA-Seq reads and to assign them to the most likely gene from which they were originated as fragments of transcripts of the gene. We provide an implementation of the proposed method in the tool lyn2vec, which produces Lyndon-based feature embeddings of sequencing reads.

Monkey Business: Reinforcement learning meets neighborhood search for Virtual Network Embedding arxiv:2202.13706 📈 1

Maxime Elkael, Massinissa Ait Aba, Andrea Araldo, Hind Castel, Badii Jouaber

**Abstract:** In this article, we consider the Virtual Network Embedding (VNE) problem for 5G networks slicing. This problem requires to allocate multiple Virtual Networks (VN) on a substrate virtualized physical network while maximizing among others, resource utilization, maximum number of placed VNs and network operator's benefit. We solve the online version of the problem where slices arrive over time. Inspired by the Nested Rollout Policy Adaptation (NRPA) algorithm, a variant of the well known Monte Carlo Tree Search (MCTS) that learns how to perform good simulations over time, we propose a new algorithm that we call Neighborhood Enhanced Policy Adaptation (NEPA). The key feature of our algorithm is to observe NRPA cannot exploit knowledge acquired in one branch of the state tree for another one which starts differently. NEPA learns by combining NRPA with Neighbordhood Search in a frugal manner which improves only promising solutions while keeping the running time low. We call this technique a monkey business because it comes down to jumping from one interesting branch to the other, similar to how monkeys jump from tree to tree instead of going down everytime. NEPA achieves better results in terms of acceptance ratio and revenue-to-cost ratio compared to other state-of-the-art algorithms, both on real and synthetic topologies.

WSLRec: Weakly Supervised Learning for Neural Sequential Recommendation Models arxiv:2202.13616 📈 1

Jingwei Zhuo, Bin Liu, Xiang Li, Han Zhu, Xiaoqiang Zhu

**Abstract:** Learning the user-item relevance hidden in implicit feedback data plays an important role in modern recommender systems. Neural sequential recommendation models, which formulates learning the user-item relevance as a sequential classification problem to distinguish items in future behaviors from others based on the user's historical behaviors, have attracted a lot of interest in both industry and academic due to their substantial practical value. Though achieving many practical successes, we argue that the intrinsic {\bf incompleteness} and {\bf inaccuracy} of user behaviors in implicit feedback data is ignored and conduct preliminary experiments for supporting our claims. Motivated by the observation that model-free methods like behavioral retargeting (BR) and item-based collaborative filtering (ItemCF) hit different parts of the user-item relevance compared to neural sequential recommendation models, we propose a novel model-agnostic training approach called WSLRec, which adopts a three-stage framework: pre-training, top-$k$ mining, and fine-tuning. WSLRec resolves the incompleteness problem by pre-training models on extra weak supervisions from model-free methods like BR and ItemCF, while resolves the inaccuracy problem by leveraging the top-$k$ mining to screen out reliable user-item relevance from weak supervisions for fine-tuning. Experiments on two benchmark datasets and online A/B tests verify the rationality of our claims and demonstrate the effectiveness of WSLRec.

Prepare for Trouble and Make it Double. Supervised and Unsupervised Stacking for AnomalyBased Intrusion Detection arxiv:2202.13611 📈 1

Tommaso Zoppi, Andrea Ceccarelli

**Abstract:** In the last decades, researchers, practitioners and companies struggled in devising mechanisms to detect malicious activities originating security threats. Amongst the many solutions, network intrusion detection emerged as one of the most popular to analyze network traffic and detect ongoing intrusions based on rules or by means of Machine Learners (MLs), which process such traffic and learn a model to suspect intrusions. Supervised MLs are very effective in detecting known threats, but struggle in identifying zero-day attacks (unknown during learning phase), which instead can be detected through unsupervised MLs. Unfortunately, there are no definitive answers on the combined use of both approaches for network intrusion detection. In this paper we first expand the problem of zero-day attacks and motivate the need to combine supervised and unsupervised algorithms. We propose the adoption of meta-learning, in the form of a two-layer Stacker, to create a mixed approach that detects both known and unknown threats. Then we implement and empirically evaluate our Stacker through an experimental campaign that allows i) debating on meta-features crafted through unsupervised base-level learners, ii) electing the most promising supervised meta-level classifiers, and iii) benchmarking classification scores of the Stacker with respect to supervised and unsupervised classifiers. Last, we compare our solution with existing works from the recent literature. Overall, our Stacker reduces misclassifications with respect to (un)supervised ML algorithms in all the 7 public datasets we considered, and outperforms existing studies in 6 out of those 7 datasets. In particular, it turns out to be more effective in detecting zero-day attacks than supervised algorithms, limiting their main weakness but still maintaining adequate capabilities in detecting known attacks.

Learning Parameters for a Generalized Vidale-Wolfe Response Model with Flexible Ad Elasticity and Word-of-Mouth arxiv:2202.13566 📈 1

Yanwu Yang, Baozhu Feng, Daniel Zeng

**Abstract:** In this research, we investigate a generalized form of Vidale-Wolfe (GVW) model. One key element of our modeling work is that the GVW model contains two useful indexes representing advertiser's elasticity and the word-of-mouth (WoM) effect, respectively. Moreover, we discuss some desirable properties of the GVW model, and present a deep neural network (DNN)-based estimation method to learn its parameters. Furthermore, based on three realworld datasets, we conduct computational experiments to validate the GVW model and identified properties. In addition, we also discuss potential advantages of the GVW model over econometric models. The research outcome shows that both the ad elasticity index and the WoM index have significant influences on advertising responses, and the GVW model has potential advantages over econometric models of advertising, in terms of several interesting phenomena drawn from practical advertising situations. The GVW model and its deep learning-based estimation method provide a basis to support big data-driven advertising analytics and decision makings; in the meanwhile, identified properties and experimental findings of this research illuminate critical managerial insights for advertisers in various advertising forms.

Machine Learning Empowered Intelligent Data Center Networking: A Survey arxiv:2202.13549 📈 1

Bo Li, Ting Wang, Peng Yang, Mingsong Chen, Shui Yu, Mounir Hamdi

**Abstract:** To support the needs of ever-growing cloud-based services, the number of servers and network devices in data centers is increasing exponentially, which in turn results in high complexities and difficulties in network optimization. To address these challenges, both academia and industry turn to artificial intelligence technology to realize network intelligence. To this end, a considerable number of novel and creative machine learning-based (ML-based) research works have been put forward in recent few years. Nevertheless, there are still enormous challenges faced by the intelligent optimization of data center networks (DCNs), especially in the scenario of online real-time dynamic processing of massive heterogeneous services and traffic data. To best of our knowledge, there is a lack of systematic and original comprehensively investigations with in-depth analysis on intelligent DCN. To this end, in this paper, we comprehensively investigate the application of machine learning to data center networking, and provide a general overview and in-depth analysis of the recent works, covering flow prediction, flow classification, load balancing, resource management, routing optimization, and congestion control. In order to provide a multi-dimensional and multi-perspective comparison of various solutions, we design a quality assessment criteria called REBEL-3S to impartially measure the strengths and weaknesses of these research works. Moreover, we also present unique insights into the technology evolution of the fusion of data center network and machine learning, together with some challenges and potential future research opportunities.

Cognitive Diagnosis with Explicit Student Vector Estimation and Unsupervised Question Matrix Learning arxiv:2203.03722 📈 0

Lu Dong, Zhenhua Ling, Qiang Ling, Zefeng Lai

**Abstract:** Cognitive diagnosis is an essential task in many educational applications. Many solutions have been designed in the literature. The deterministic input, noisy "and" gate (DINA) model is a classical cognitive diagnosis model and can provide interpretable cognitive parameters, e.g., student vectors. However, the assumption of the probabilistic part of DINA is too strong, because it assumes that the slip and guess rates of questions are student-independent. Besides, the question matrix (i.e., Q-matrix) recording the skill distribution of the questions in the cognitive diagnosis domain often requires precise labels given by domain experts. Thus, we propose an explicit student vector estimation (ESVE) method to estimate the student vectors of DINA with a local self-consistent test, which does not rely on any assumptions for the probabilistic part of DINA. Then, based on the estimated student vectors, the probabilistic part of DINA can be modified to a student dependent model that the slip and guess rates are related to student vectors. Furthermore, we propose an unsupervised method called heuristic bidirectional calibration algorithm (HBCA) to label the Q-matrix automatically, which connects the question difficulty relation and the answer results for initialization and uses the fault tolerance of ESVE-DINA for calibration. The experimental results on two real-world datasets show that ESVE-DINA outperforms the DINA model on accuracy and that the Q-matrix labeled automatically by HBCA can achieve performance comparable to that obtained with the manually labeled Q-matrix when using the same model structure.

Defining a synthetic data generator for realistic electric vehicle charging sessions arxiv:2203.01129 📈 0

Manu Lahariya, Dries Benoit, Chris Develder

**Abstract:** Electric vehicle (EV) charging stations have become prominent in electricity grids in the past years. Analysis of EV charging sessions is useful for flexibility analysis, load balancing, offering incentives to customers, etc. Yet, the limited availability of such EV sessions data hinders further development in these fields. Addressing this need for publicly available and realistic data, we develop a synthetic data generator (SDG) for EV charging sessions. Our SDG assumes the EV inter-arrival time to follow an exponential distribution. Departure times are modeled by defining a conditional probability density function (pdf) for connection times. This pdf for connection time and required energy is fitted by Gaussian mixture models. Since we train our SDG using a large real-world dataset, its output is realistic.

The Concordance Index decomposition: a measure for a deeper understanding of survival prediction models arxiv:2203.00144 📈 0

Abdallah Alabdallah, Mattias Ohlsson, Sepideh Pashami, Thorsteinn Rögnvaldsson

**Abstract:** The Concordance Index (C-index) is a commonly used metric in Survival Analysis to evaluate how good a prediction model is. This paper proposes a decomposition of the C-Index into a weighted harmonic mean of two quantities: one for ranking observed events versus other observed events, and the other for ranking observed events versus censored cases. This decomposition allows a more fine-grained analysis of the pros and cons of survival prediction methods. The utility of the decomposition is demonstrated using three benchmark survival analysis models (Cox Proportional Hazard, Random Survival Forest, and Deep Adversarial Time-to-Event Network) together with a new variational generative neural-network-based method (SurVED), which is also proposed in this paper. The demonstration is done on four publicly available datasets with varying censoring levels. The analysis with the C-index decomposition shows that all methods essentially perform equally well when the censoring level is high because of the dominance of the term measuring the ranking of events versus censored cases. In contrast, some methods deteriorate when the censoring level decreases because they do not rank the events versus other events well.

MaMaDroid2.0 -- The Holes of Control Flow Graphs arxiv:2202.13922 📈 0

Harel Berger, Chen Hajaj, Enrico Mariconti, Amit Dvir

**Abstract:** Android malware is a continuously expanding threat to billions of mobile users around the globe. Detection systems are updated constantly to address these threats. However, a backlash takes the form of evasion attacks, in which an adversary changes malicious samples such that those samples will be misclassified as benign. This paper fully inspects a well-known Android malware detection system, MaMaDroid, which analyzes the control flow graph of the application. Changes to the portion of benign samples in the train set and models are considered to see their effect on the classifier. The changes in the ratio between benign and malicious samples have a clear effect on each one of the models, resulting in a decrease of more than 40% in their detection rate. Moreover, adopted ML models are implemented as well, including 5-NN, Decision Tree, and Adaboost. Exploration of the six models reveals a typical behavior in different cases, of tree-based models and distance-based models. Moreover, three novel attacks that manipulate the CFG and their detection rates are described for each one of the targeted models. The attacks decrease the detection rate of most of the models to 0%, with regards to different ratios of benign to malicious apps. As a result, a new version of MaMaDroid is engineered. This model fuses the CFG of the app and static analysis of features of the app. This improved model is proved to be robust against evasion attacks targeting both CFG-based models and static analysis models, achieving a detection rate of more than 90% against each one of the attacks.

Prev: 2022.02.27 Next: 2022.03.01