Prev: 2022.06.18 Next: 2022.06.20

Summary for 2022-06-19, created on 2022-06-29

A Self-Guided Framework for Radiology Report Generation arxiv:2206.09378 📈 87

Jun Li, Shibo Li, Ying Hu, Huiren Tao

**Abstract:** Automatic radiology report generation is essential to computer-aided diagnosis. Through the success of image captioning, medical report generation has been achievable. However, the lack of annotated disease labels is still the bottleneck of this area. In addition, the image-text data bias problem and complex sentences make it more difficult to generate accurate reports. To address these gaps, we pre-sent a self-guided framework (SGF), a suite of unsupervised and supervised deep learning methods to mimic the process of human learning and writing. In detail, our framework obtains the domain knowledge from medical reports with-out extra disease labels and guides itself to extract fined-grain visual features as-sociated with the text. Moreover, SGF successfully improves the accuracy and length of medical report generation by incorporating a similarity comparison mechanism that imitates the process of human self-improvement through compar-ative practice. Extensive experiments demonstrate the utility of our SGF in the majority of cases, showing its superior performance over state-of-the-art meth-ods. Our results highlight the capacity of the proposed framework to distinguish fined-grained visual details between words and verify its advantage in generating medical reports.

Resource-Efficient Separation Transformer arxiv:2206.09507 📈 21

Cem Subakan, Mirco Ravanelli, Samuele Cornell, Frédéric Lepoutre, François Grondin

**Abstract:** Transformers have recently achieved state-of-the-art performance in speech separation. These models, however, are computationally-demanding and require a lot of learnable parameters. This paper explores Transformer-based speech separation with a reduced computational cost. Our main contribution is the development of the Resource-Efficient Separation Transformer (RE-SepFormer), a self-attention-based architecture that reduces the computational burden in two ways. First, it uses non-overlapping blocks in the latent space. Second, it operates on compact latent summaries calculated from each chunk. The RE-SepFormer reaches a competitive performance on the popular WSJ0-2Mix and WHAM! datasets in both causal and non-causal settings. Remarkably, it scales significantly better than the previous Transformer and RNN-based architectures in terms of memory and inference-time, making it more suitable for processing long mixtures.

Policy Optimization with Linear Temporal Logic Constraints arxiv:2206.09546 📈 20

Cameron Voloshin, Hoang M. Le, Swarat Chaudhuri, Yisong Yue

**Abstract:** We study the problem of policy optimization (PO) with linear temporal logic (LTL) constraints. The language of LTL allows flexible description of tasks that may be unnatural to encode as a scalar cost function. We consider LTL-constrained PO as a systematic framework, decoupling task specification from policy selection, and an alternative to the standard of cost shaping. With access to a generative model, we develop a model-based approach that enjoys a sample complexity analysis for guaranteeing both task satisfaction and cost optimality (through a reduction to a reachability problem). Empirically, our algorithm can achieve strong performance even in low sample regimes.

Agricultural Plantation Classification using Transfer Learning Approach based on CNN arxiv:2206.09420 📈 15

Uphar Singh, Tushar Musale, Ranjana Vyas, O. P. Vyas

**Abstract:** Hyper-spectral images are images captured from a satellite that gives spatial and spectral information of specific region.A Hyper-spectral image contains much more number of channels as compared to a RGB image, hence containing more information about entities within the image. It makes them well suited for the classification of objects in a snap. In the past years, the efficiency of hyper-spectral image recognition has increased significantly with deep learning. The Convolution Neural Network(CNN) and Multi-Layer Perceptron(MLP) has demonstrated to be an excellent process of classifying images. However, they suffer from the issues of long training time and requirement of large amounts of the labeled data, to achieve the expected outcome. These issues become more complex while dealing with hyper-spectral images. To decrease the training time and reduce the dependence on large labeled data-set, we propose using the method of transfer learning.The features learned by CNN and MLP models are then used by the transfer learning model to solve a new classification problem on an unseen dataset. A detailed comparison of CNN and multiple MLP architectural models is performed, to determine an optimum architecture that suits best the objective. The results show that the scaling of layers not always leads to increase in accuracy but often leads to over-fitting, and also an increase in the training time.The training time is reduced to greater extent by applying the transfer learning approach rather than just approaching the problem by directly training a new model on large data-sets, without much affecting the accuracy.

Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors arxiv:2206.09500 📈 12

Yen-Cheng Liu, Chih-Yao Ma, Zsolt Kira

**Abstract:** With the recent development of Semi-Supervised Object Detection (SS-OD) techniques, object detectors can be improved by using a limited amount of labeled data and abundant unlabeled data. However, there are still two challenges that are not addressed: (1) there is no prior SS-OD work on anchor-free detectors, and (2) prior works are ineffective when pseudo-labeling bounding box regression. In this paper, we present Unbiased Teacher v2, which shows the generalization of SS-OD method to anchor-free detectors and also introduces Listen2Student mechanism for the unsupervised regression loss. Specifically, we first present a study examining the effectiveness of existing SS-OD methods on anchor-free detectors and find that they achieve much lower performance improvements under the semi-supervised setting. We also observe that box selection with centerness and the localization-based labeling used in anchor-free detectors cannot work well under the semi-supervised setting. On the other hand, our Listen2Student mechanism explicitly prevents misleading pseudo-labels in the training of bounding box regression; we specifically develop a novel pseudo-labeling selection mechanism based on the Teacher and Student's relative uncertainties. This idea contributes to favorable improvement in the regression branch in the semi-supervised setting. Our method, which works for both anchor-free and anchor-based methods, consistently performs favorably against the state-of-the-art methods in VOC, COCO-standard, and COCO-additional.

ADBench: Anomaly Detection Benchmark arxiv:2206.09426 📈 11

Songqiao Han, Xiyang Hu, Hailiang Huang, Mingqi Jiang, Yue Zhao

**Abstract:** Given a long list of anomaly detection algorithms developed in the last few decades, how do they perform with regard to (i) varying levels of supervision, (ii) different types of anomalies, and (iii) noisy and corrupted data? In this work, we answer these key questions by conducting (to our best knowledge) the most comprehensive anomaly detection benchmark with 30 algorithms on 55 benchmark datasets, named ADBench. Our extensive experiments (93,654 in total) identify meaningful insights into the role of supervision and anomaly types, and unlock future directions for researchers in algorithm selection and design. With ADBench, researchers can easily conduct comprehensive and fair evaluations for newly proposed methods on the datasets (including our contributed ones from natural language and computer vision domains) against the existing baselines. To foster accessibility and reproducibility, we fully open-source ADBench and the corresponding results.

3D Object Detection for Autonomous Driving: A Review and New Outlooks arxiv:2206.09474 📈 10

Jiageng Mao, Shaoshuai Shi, Xiaogang Wang, Hongsheng Li

**Abstract:** Autonomous driving, in recent years, has been receiving increasing attention for its potential to relieve drivers' burdens and improve the safety of driving. In modern autonomous driving pipelines, the perception system is an indispensable component, aiming to accurately estimate the status of surrounding environments and provide reliable observations for prediction and planning. 3D object detection, which intelligently predicts the locations, sizes, and categories of the critical 3D objects near an autonomous vehicle, is an important part of a perception system. This paper reviews the advances in 3D object detection for autonomous driving. First, we introduce the background of 3D object detection and discuss the challenges in this task. Second, we conduct a comprehensive survey of the progress in 3D object detection from the aspects of models and sensory inputs, including LiDAR-based, camera-based, and multi-modal detection approaches. We also provide an in-depth analysis of the potentials and challenges in each category of methods. Additionally, we systematically investigate the applications of 3D object detection in driving systems. Finally, we conduct a performance analysis of the 3D object detection approaches, and we further summarize the research trends over the years and prospect the future directions of this area.

An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models arxiv:2206.09556 📈 9

Rahil Parikh, Gaspar Rochette, Carol Espy-Wilson, Shihab Shamma

**Abstract:** End-to-end learning models have demonstrated a remarkable capability in performing speech segregation. Despite their wide-scope of real-world applications, little is known about the mechanisms they employ to group and consequently segregate individual speakers. Knowing that harmonicity is a critical cue for these networks to group sources, in this work, we perform a thorough investigation on ConvTasnet and DPT-Net to analyze how they perform a harmonic analysis of the input mixture. We perform ablation studies where we apply low-pass, high-pass, and band-stop filters of varying pass-bands to empirically analyze the harmonics most critical for segregation. We also investigate how these networks decide which output channel to assign to an estimated source by introducing discontinuities in synthetic mixtures. We find that end-to-end networks are highly unstable, and perform poorly when confronted with deformations which are imperceptible to humans. Replacing the encoder in these networks with a spectrogram leads to lower overall performance, but much higher stability. This work helps us to understand what information these network rely on for speech segregation, and exposes two sources of generalization-errors. It also pinpoints the encoder as the part of the network responsible for these errors, allowing for a redesign with expert knowledge or transfer learning.

Scalable Neural Data Server: A Data Recommender for Transfer Learning arxiv:2206.09386 📈 8

Tianshi Cao, Sasha Doubov, David Acuna, Sanja Fidler

**Abstract:** Absence of large-scale labeled data in the practitioner's target domain can be a bottleneck to applying machine learning algorithms in practice. Transfer learning is a popular strategy for leveraging additional data to improve the downstream performance, but finding the most relevant data to transfer from can be challenging. Neural Data Server (NDS), a search engine that recommends relevant data for a given downstream task, has been previously proposed to address this problem. NDS uses a mixture of experts trained on data sources to estimate similarity between each source and the downstream task. Thus, the computational cost to each user grows with the number of sources. To address these issues, we propose Scalable Neural Data Server (SNDS), a large-scale search engine that can theoretically index thousands of datasets to serve relevant ML data to end users. SNDS trains the mixture of experts on intermediary datasets during initialization, and represents both data sources and downstream tasks by their proximity to the intermediary datasets. As such, computational cost incurred by SNDS users remains fixed as new datasets are added to the server. We validate SNDS on a plethora of real world tasks and find that data recommended by SNDS improves downstream task performance over baselines. We also demonstrate the scalability of SNDS by showing its ability to select relevant data for transfer outside of the natural image setting.

Meta-learning for Out-of-Distribution Detection via Density Estimation in Latent Space arxiv:2206.09543 📈 7

Tomoharu Iwata, Atsutoshi Kumagai

**Abstract:** Many neural network-based out-of-distribution (OoD) detection methods have been proposed. However, they require many training data for each target task. We propose a simple yet effective meta-learning method to detect OoD with small in-distribution data in a target task. With the proposed method, the OoD detection is performed by density estimation in a latent space. A neural network shared among all tasks is used to flexibly map instances in the original space to the latent space. The neural network is meta-learned such that the expected OoD detection performance is improved by using various tasks that are different from the target tasks. This meta-learning procedure enables us to obtain appropriate representations in the latent space for OoD detection. For density estimation, we use a Gaussian mixture model (GMM) with full covariance for each class. We can adapt the GMM parameters to in-distribution data in each task in a closed form by maximizing the likelihood. Since the closed form solution is differentiable, we can meta-learn the neural network efficiently with a stochastic gradient descent method by incorporating the solution into the meta-learning objective function. In experiments using six datasets, we demonstrate that the proposed method achieves better performance than existing meta-learning and OoD detection methods.

Towards Adversarial Attack on Vision-Language Pre-training Models arxiv:2206.09391 📈 7

Jiaming Zhang, Qi Yi, Jitao Sang

**Abstract:** While vision-language pre-training model (VLP) has shown revolutionary improvements on various vision-language (V+L) tasks, the studies regarding its adversarial robustness remain largely unexplored. This paper studied the adversarial attack on popular VLP models and V+L tasks. First, we analyzed the performance of adversarial attacks under different settings. By examining the influence of different perturbed objects and attack targets, we concluded some key observations as guidance on both designing strong multimodal adversarial attack and constructing robust VLP models. Second, we proposed a novel multimodal attack method on the VLP models called Collaborative Multimodal Adversarial Attack (Co-Attack), which collectively carries out the attacks on the image modality and the text modality. Experimental results demonstrated that the proposed method achieves improved attack performances on different V+L downstream tasks and VLP models. The analysis observations and novel attack method hopefully provide new understanding into the adversarial robustness of VLP models, so as to contribute their safe and reliable deployment in more real-world scenarios.

Productive Reproducible Workflows for DNNs: A Case Study for Industrial Defect Detection arxiv:2206.09359 📈 6

Perry Gibson, José Cano

**Abstract:** As Deep Neural Networks (DNNs) have become an increasingly ubiquitous workload, the range of libraries and tooling available to aid in their development and deployment has grown significantly. Scalable, production quality tools are freely available under permissive licenses, and are accessible enough to enable even small teams to be very productive. However within the research community, awareness and usage of said tools is not necessarily widespread, and researchers may be missing out on potential productivity gains from exploiting the latest tools and workflows. This paper presents a case study where we discuss our recent experience producing an end-to-end artificial intelligence application for industrial defect detection. We detail the high level deep learning libraries, containerized workflows, continuous integration/deployment pipelines, and open source code templates we leveraged to produce a competitive result, matching the performance of other ranked solutions to our three target datasets. We highlight the value that exploiting such systems can bring, even for research, and detail our solution and present our best results in terms of accuracy and inference time on a server class GPU, as well as inference times on a server class CPU, and a Raspberry Pi 4.

Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation arxiv:2206.10606 📈 5

Jenny Zhang, Samson Yu, Jiafei Duan, Cheston Tan

**Abstract:** In reality, it is often more efficient to ask for help than to search the entire space to find an object with an unknown location. We present a learning framework that enables an agent to actively ask for help in such embodied visual navigation tasks, where the feedback informs the agent of where the goal is in its view. To emulate the real-world scenario that a teacher may not always be present, we propose a training curriculum where feedback is not always available. We formulate an uncertainty measure of where the goal is and use empirical results to show that through this approach, the agent learns to ask for help effectively while remaining robust when feedback is not available.

Artificial intelligence system based on multi-value classification of fully connected neural network for construction management arxiv:2206.10604 📈 5

Tetyana Honcharenko, Roman Akselrod, Andrii Shpakov, Oleksandr Khomenko

**Abstract:** This study is devoted to solving the problem to determine the professional adaptive capabilities of construction management staff using artificial intelligence systems.It is proposed Fully Connected Feed-Forward Neural Network architecture and performed empirical modeling to create a Data Set. Model of artificial intelligence system allows evaluating the processes in an Fully Connected Feed-Forward Neural Network during the execution of multi-value classification of professional areas. A method has been developed for the training process of a machine learning model, which reflects the internal connections between the components of an artificial intelligence system that allow it to learn from training data. To train the neural network, a data set of 35 input parameters and 29 output parameters was used; the amount of data in the set is 936 data lines. Neural network training occurred in the proportion of 10% and 90%, respectively. Results of this study research can be used to further improve the knowledge and skills necessary for successful professional realization.

Bounding Evidence and Estimating Log-Likelihood in VAE arxiv:2206.09453 📈 5

Łukasz Struski, Marcin Mazur, Paweł Batorski, Przemysław Spurek, Jacek Tabor

**Abstract:** Many crucial problems in deep learning and statistics are caused by a variational gap, i.e., a difference between evidence and evidence lower bound (ELBO). As a consequence, in the classical VAE model, we obtain only the lower bound on the log-likelihood since ELBO is used as a cost function, and therefore we cannot compare log-likelihood between models. In this paper, we present a general and effective upper bound of the variational gap, which allows us to efficiently estimate the true evidence. We provide an extensive theoretical study of the proposed approach. Moreover, we show that by applying our estimation, we can easily obtain lower and upper bounds for the log-likelihood of VAE models.

Terrain Classification using Transfer Learning on Hyperspectral Images: A Comparative study arxiv:2206.09414 📈 5

Uphar Singh, Kumar Saurabh, Neelaksh Trehan, Ranjana Vyas, O. P. Vyas

**Abstract:** A Hyperspectral image contains much more number of channels as compared to a RGB image, hence containing more information about entities within the image. The convolutional neural network (CNN) and the Multi-Layer Perceptron (MLP) have been proven to be an effective method of image classification. However, they suffer from the issues of long training time and requirement of large amounts of the labeled data, to achieve the expected outcome. These issues become more complex while dealing with hyperspectral images. To decrease the training time and reduce the dependence on large labeled dataset, we propose using the method of transfer learning. The hyperspectral dataset is preprocessed to a lower dimension using PCA, then deep learning models are applied to it for the purpose of classification. The features learned by this model are then used by the transfer learning model to solve a new classification problem on an unseen dataset. A detailed comparison of CNN and multiple MLP architectural models is performed, to determine an optimum architecture that suits best the objective. The results show that the scaling of layers not always leads to increase in accuracy but often leads to overfitting, and also an increase in the training time.The training time is reduced to greater extent by applying the transfer learning approach rather than just approaching the problem by directly training a new model on large datasets, without much affecting the accuracy.

Prevent Car Accidents by Using AI arxiv:2206.11381 📈 4

Sri Siddhartha Reddy Gudemupati, Yen Ling Chao, Lakshmi Praneetha Kotikalapudi, Ebrima Ceesay

**Abstract:** Transportation facilities are becoming more developed as society develops, and people's travel demand is increasing, but so are the traffic safety issues that arise as a result. And car accidents are a major issue all over the world. The cost of traffic fatalities and driver injuries has a significant impact on society. The use of machine learning techniques in the field of traffic accidents is becoming increasingly popular. Machine learning classifiers are used instead of traditional data mining techniques to produce better results and accuracy. As a result, this project conducts research on existing work related to accident prediction using machine learning. We will use crash data and weather data to train machine learning models to predict crash severity and reduce crashes.

Robust One Round Federated Learning with Predictive Space Bayesian Inference arxiv:2206.09526 📈 4

Mohsin Hasan, Zehao Zhang, Kaiyang Guo, Mahdi Karami, Guojun Zhang, Xi Chen, Pascal Poupart

**Abstract:** Making predictions robust is an important challenge. A separate challenge in federated learning (FL) is to reduce the number of communication rounds, particularly since doing so reduces performance in heterogeneous data settings. To tackle both issues, we take a Bayesian perspective on the problem of learning a global model. We show how the global predictive posterior can be approximated using client predictive posteriors. This is unlike other works which aggregate the local model space posteriors into the global model space posterior, and are susceptible to high approximation errors due to the posterior's high dimensional multimodal nature. In contrast, our method performs the aggregation on the predictive posteriors, which are typically easier to approximate owing to the low-dimensionality of the output space. We present an algorithm based on this idea, which performs MCMC sampling at each client to obtain an estimate of the local posterior, and then aggregates these in one round to obtain a global ensemble model. Through empirical evaluation on several classification and regression tasks, we show that despite using one round of communication, the method is competitive with other FL techniques, and outperforms them on heterogeneous settings. The code is publicly available at https://github.com/hasanmohsin/FedPredSpace_1Round.

Hybrid Facial Expression Recognition (FER2013) Model for Real-Time Emotion Classification and Prediction arxiv:2206.09509 📈 4

Ozioma Collins Oguine, Kaleab Alamayehu Kinfu, Kanyifeechukwu Jane Oguine, Hashim Ibrahim Bisallah, Daniel Ofuani

**Abstract:** Facial Expression Recognition is a vital research topic in most fields ranging from artificial intelligence and gaming to Human-Computer Interaction (HCI) and Psychology. This paper proposes a hybrid model for Facial Expression recognition, which comprises a Deep Convolutional Neural Network (DCNN) and Haar Cascade deep learning architectures. The objective is to classify real-time and digital facial images into one of the seven facial emotion categories considered. The DCNN employed in this research has more convolutional layers, ReLU Activation functions, and multiple kernels to enhance filtering depth and facial feature extraction. In addition, a haar cascade model was also mutually used to detect facial features in real-time images and video frames. Grayscale images from the Kaggle repository (FER-2013) and then exploited Graphics Processing Unit (GPU) computation to expedite the training and validation process. Pre-processing and data augmentation techniques are applied to improve training efficiency and classification performance. The experimental results show a significantly improved classification performance compared to state-of-the-art (SoTA) experiments and research. Also, compared to other conventional models, this paper validates that the proposed architecture is superior in classification performance with an improvement of up to 6%, totaling up to 70% accuracy, and with less execution time of 2098.8s.

LogGENE: A smooth alternative to check loss for Deep Healthcare Inference Tasks arxiv:2206.09333 📈 4

Aryaman Jeendgar, Aditya Pola, Soma S Dhavala, Snehanshu Saha

**Abstract:** High-throughput Genomics is ushering a new era in personalized health care, and targeted drug design and delivery. Mining these large datasets, and obtaining calibrated predictions is of immediate relevance and utility. In our work, we develop methods for Gene Expression Inference based on Deep neural networks. However, unlike typical Deep learning methods, our inferential technique, while achieving state-of-the-art performance in terms of accuracy, can also provide explanations, and report uncertainty estimates. We adopt the Quantile Regression framework to predict full conditional quantiles for a given set of house keeping gene expressions. Conditional quantiles, in addition to being useful in providing rich interpretations of the predictions, are also robust to measurement noise. However, check loss, used in quantile regression to drive the estimation process is not differentiable. We propose log-cosh as a smooth-alternative to the check loss. We apply our methods on GEO microarray dataset. We also extend the method to binary classification setting. Furthermore, we investigate other consequences of the smoothness of the loss in faster convergence.

Navigating Incommensurability Between Ethnomethodology, Conversation Analysis, and Artificial Intelligence arxiv:2206.11899 📈 3

Stuart Reeves

**Abstract:** Like many research communities, ethnomethodologists and conversation analysts have begun to get caught up -- yet again -- in the pervasive spectacle of surging interests in Artificial Intelligence (AI). Inspired by discussions amongst a growing network of researchers in ethnomethodology (EM) and conversation analysis (CA) traditions who nurse such interests, I started thinking about what things EM and the more EM end of conversation analysis might be doing about, for, or even with, fields of AI research. So, this piece is about the disciplinary and conceptual questions that might be encountered, and -- in my view -- may need addressing for engagements with AI research and its affiliates. Although I'm mostly concerned with things to be aware of as well as outright dangers, later on we can think about some opportunities. And throughout I will keep using 'we' to talk about EM&CA researchers; but this really is for convenience only -- I don't wish to ventriloquise for our complex research communities. All of the following should be read as emanating from my particular research history, standpoint etc., and treated (hopefully) as an invitation for further discussion amongst EM and CA researchers turning to technology and AI specifically.

Shuffle Gaussian Mechanism for Differential Privacy arxiv:2206.09569 📈 3

Seng Pei Liew, Tsubasa Takahashi

**Abstract:** We study Gaussian mechanism in the shuffle model of differential privacy (DP). Particularly, we characterize the mechanism's Rényi differential privacy (RDP), showing that it is of the form: $$ ε(λ) \leq \frac{1}{λ-1}\log\left(\frac{e^{-λ/2σ^2}}{n^λ}\sum_{\substack{k_1+\dotsc+k_n=λ;\\k_1,\dotsc,k_n\geq 0}}\binom{λ!}{k_1,\dotsc,k_n}e^{\sum_{i=1}^nk_i^2/2σ^2}\right) $$ We further prove that the RDP is strictly upper-bounded by the Gaussian RDP without shuffling. The shuffle Gaussian RDP is advantageous in composing multiple DP mechanisms, where we demonstrate its improvement over the state-of-the-art approximate DP composition theorems in privacy guarantees of the shuffle model. Moreover, we extend our study to the subsampled shuffle mechanism and the recently proposed shuffled check-in mechanism, which are protocols geared towards distributed/federated learning. Finally, an empirical study of these mechanisms is given to demonstrate the efficacy of employing shuffle Gaussian mechanism under the distributed learning framework to guarantee rigorous user privacy.

Multiple Testing Framework for Out-of-Distribution Detection arxiv:2206.09522 📈 3

Akshayaa Magesh, Venugopal V. Veeravalli, Anirban Roy, Susmit Jha

**Abstract:** We study the problem of Out-of-Distribution (OOD) detection, that is, detecting whether a learning algorithm's output can be trusted at inference time. While a number of tests for OOD detection have been proposed in prior work, a formal framework for studying this problem is lacking. We propose a definition for the notion of OOD that includes both the input distribution and the learning algorithm, which provides insights for the construction of powerful tests for OOD detection. We propose a multiple hypothesis testing inspired procedure to systematically combine any number of different statistics from the learning algorithm using conformal p-values. We further provide strong guarantees on the probability of incorrectly classifying an in-distribution sample as OOD. In our experiments, we find that threshold-based tests proposed in prior work perform well in specific settings, but not uniformly well across different types of OOD instances. In contrast, our proposed method that combines multiple statistics performs uniformly well across different datasets and neural networks.

Learning Multi-Task Transferable Rewards via Variational Inverse Reinforcement Learning arxiv:2206.09498 📈 3

Se-Wook Yoo, Seung-Woo Seo

**Abstract:** Many robotic tasks are composed of a lot of temporally correlated sub-tasks in a highly complex environment. It is important to discover situational intentions and proper actions by deliberating on temporal abstractions to solve problems effectively. To understand the intention separated from changing task dynamics, we extend an empowerment-based regularization technique to situations with multiple tasks based on the framework of a generative adversarial network. Under the multitask environments with unknown dynamics, we focus on learning a reward and policy from the unlabeled expert examples. In this study, we define situational empowerment as the maximum of mutual information representing how an action conditioned on both a certain state and sub-task affects the future. Our proposed method derives the variational lower bound of the situational mutual information to optimize it. We simultaneously learn the transferable multi-task reward function and policy by adding an induced term to the objective function. By doing so, the multi-task reward function helps to learn a robust policy for environmental change. We validate the advantages of our approach on multi-task learning and multi-task transfer learning. We demonstrate our proposed method has the robustness of both randomness and changing task dynamics. Finally, we prove that our method has significantly better performance and data efficiency than existing imitation learning methods on various benchmarks.

StudioGAN: A Taxonomy and Benchmark of GANs for Image Synthesis arxiv:2206.09479 📈 3

Minguk Kang, Joonghyuk Shin, Jaesik Park

**Abstract:** Generative Adversarial Network (GAN) is one of the state-of-the-art generative models for realistic image synthesis. While training and evaluating GAN becomes increasingly important, the current GAN research ecosystem does not provide reliable benchmarks for which the evaluation is conducted consistently and fairly. Furthermore, because there are few validated GAN implementations, researchers devote considerable time to reproducing baselines. We study the taxonomy of GAN approaches and present a new open-source library named StudioGAN. StudioGAN supports 7 GAN architectures, 9 conditioning methods, 4 adversarial losses, 13 regularization modules, 3 differentiable augmentations, 7 evaluation metrics, and 5 evaluation backbones. With our training and evaluation protocol, we present a large-scale benchmark using various datasets (CIFAR10, ImageNet, AFHQv2, FFHQ, and Baby/Papa/Granpa-ImageNet) and 3 different evaluation backbones (InceptionV3, SwAV, and Swin Transformer). Unlike other benchmarks used in the GAN community, we train representative GANs, including BigGAN, StyleGAN2, and StyleGAN3, in a unified training pipeline and quantify generation performance with 7 evaluation metrics. The benchmark evaluates other cutting-edge generative models(e.g., StyleGAN-XL, ADM, MaskGIT, and RQ-Transformer). StudioGAN provides GAN implementations, training, and evaluation scripts with the pre-trained weights. StudioGAN is available at https://github.com/POSTECH-CVLab/PyTorch-StudioGAN.

All you need is feedback: Communication with block attention feedback codes arxiv:2206.09457 📈 3

Emre Ozfatura, Yulin Shao, Alberto Perotti, Branislav Popovic, Deniz Gunduz

**Abstract:** Deep learning based channel code designs have recently gained interest as an alternative to conventional coding algorithms, particularly for channels for which existing codes do not provide effective solutions. Communication over a feedback channel is one such problem, for which promising results have recently been obtained by employing various deep learning architectures. In this paper, we introduce a novel learning-aided code design for feedback channels, called generalized block attention feedback (GBAF) codes, which i) employs a modular architecture that can be implemented using different neural network architectures; ii) provides order-of-magnitude improvements in the probability of error compared to existing designs; and iii) can transmit at desired code rates.

Towards Unified Conversational Recommender Systems via Knowledge-Enhanced Prompt Learning arxiv:2206.09363 📈 3

Xiaolei Wang, Kun Zhou, Ji-Rong Wen, Wayne Xin Zhao

**Abstract:** Conversational recommender systems (CRS) aim to proactively elicit user preference and recommend high-quality items through natural language conversations. Typically, a CRS consists of a recommendation module to predict preferred items for users and a conversation module to generate appropriate responses. To develop an effective CRS, it is essential to seamlessly integrate the two modules. Existing works either design semantic alignment strategies, or share knowledge resources and representations between the two modules. However, these approaches still rely on different architectures or techniques to develop the two modules, making it difficult for effective module integration. To address this problem, we propose a unified CRS model named UniCRS based on knowledge-enhanced prompt learning. Our approach unifies the recommendation and conversation subtasks into the prompt learning paradigm, and utilizes knowledge-enhanced prompts based on a fixed pre-trained language model (PLM) to fulfill both subtasks in a unified approach. In the prompt design, we include fused knowledge representations, task-specific soft tokens, and the dialogue context, which can provide sufficient contextual information to adapt the PLM for the CRS task. Besides, for the recommendation subtask, we also incorporate the generated response template as an important part of the prompt, to enhance the information interaction between the two subtasks. Extensive experiments on two public CRS datasets have demonstrated the effectiveness of our approach.

A Unified Understanding of Deep NLP Models for Text Classification arxiv:2206.09355 📈 3

Zhen Li, Xiting Wang, Weikai Yang, Jing Wu, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun, Hui Zhang, Shixia Liu

**Abstract:** The rapid development of deep natural language processing (NLP) models for text classification has led to an urgent need for a unified understanding of these models proposed individually. Existing methods cannot meet the need for understanding different models in one framework due to the lack of a unified measure for explaining both low-level (e.g., words) and high-level (e.g., phrases) features. We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification. The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample. We model the intra- and inter-word information at each layer measuring the importance of a word to the final prediction as well as the relationships between words, such as the formation of phrases. A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples. Two case studies on classification tasks and comparison between models demonstrate that DeepNLPVis can help users effectively identify potential problems caused by samples and model architectures and then make informed improvements.

Finding Diverse and Predictable Subgraphs for Graph Domain Generalization arxiv:2206.09345 📈 3

Junchi Yu, Jian Liang, Ran He

**Abstract:** This paper focuses on out-of-distribution generalization on graphs where performance drops due to the unseen distribution shift. Previous graph domain generalization works always resort to learning an invariant predictor among different source domains. However, they assume sufficient source domains are available during training, posing huge challenges for realistic applications. By contrast, we propose a new graph domain generalization framework, dubbed as DPS, by constructing multiple populations from the source domains. Specifically, DPS aims to discover multiple \textbf{D}iverse and \textbf{P}redictable \textbf{S}ubgraphs with a set of generators, namely, subgraphs are different from each other but all the them share the same semantics with the input graph. These generated source domains are exploited to learn an \textit{equi-predictive} graph neural network (GNN) across domains, which is expected to generalize well to unseen target domains. Generally, DPS is model-agnostic that can be incorporated with various GNN backbones. Extensive experiments on both node-level and graph-level benchmarks shows that the proposed DPS achieves impressive performance for various graph domain generalization tasks.

Bayesian Optimization under Stochastic Delayed Feedback arxiv:2206.09341 📈 3

Arun Verma, Zhongxiang Dai, Bryan Kian Hsiang Low

**Abstract:** Bayesian optimization (BO) is a widely-used sequential method for zeroth-order optimization of complex and expensive-to-compute black-box functions. The existing BO methods assume that the function evaluation (feedback) is available to the learner immediately or after a fixed delay. Such assumptions may not be practical in many real-life problems like online recommendations, clinical trials, and hyperparameter tuning where feedback is available after a random delay. To benefit from the experimental parallelization in these problems, the learner needs to start new function evaluations without waiting for delayed feedback. In this paper, we consider the BO under stochastic delayed feedback problem. We propose algorithms with sub-linear regret guarantees that efficiently address the dilemma of selecting new function queries while waiting for randomly delayed feedback. Building on our results, we also make novel contributions to batch BO and contextual Gaussian process bandits. Experiments on synthetic and real-life datasets verify the performance of our algorithms.

Traffic-Twitter Transformer: A Nature Language Processing-joined Framework For Network-wide Traffic Forecasting arxiv:2206.11078 📈 2

Meng-Ju Tsai, Zhiyong Cui, Hao, Yang, Yinhai Wang

**Abstract:** With accurate and timely traffic forecasting, the impacted traffic conditions can be predicted in advance to guide agencies and residents to respond to changes in traffic patterns appropriately. However, existing works on traffic forecasting mainly relied on historical traffic patterns confining to short-term prediction, under 1 hour, for instance. To better manage future roadway capacity and accommodate social and human impacts, it is crucial to propose a flexible and comprehensive framework to predict physical-aware long-term traffic conditions for public users and transportation agencies. In this paper, the gap of robust long-term traffic forecasting was bridged by taking social media features into consideration. A correlation study and a linear regression model were first implemented to evaluate the significance of the correlation between two time-series data, traffic intensity and Twitter data intensity. Two time-series data were then fed into our proposed social-aware framework, Traffic-Twitter Transformer, which integrated Nature Language representations into time-series records for long-term traffic prediction. Experimental results in the Great Seattle Area showed that our proposed model outperformed baseline models in all evaluation matrices. This NLP-joined social-aware framework can become a valuable implement of network-wide traffic prediction and management for traffic agencies.

An Embedded Feature Selection Framework for Control arxiv:2206.11064 📈 2

Jiawen Wei, Fangyuan Wang, Wanxin Zeng, Wenwei Lin, Ning Gui

**Abstract:** Reducing sensor requirements while keeping optimal control performance is crucial to many industrial control applications to achieve robust, low-cost, and computation-efficient controllers. However, existing feature selection solutions for the typical machine learning domain can hardly be applied in the domain of control with changing dynamics. In this paper, a novel framework, namely the Dual-world embedded Attentive Feature Selection (D-AFS), can efficiently select the most relevant sensors for the system under dynamic control. Rather than the one world used in most Deep Reinforcement Learning (DRL) algorithms, D-AFS has both the real world and its virtual peer with twisted features. By analyzing the DRL's response in two worlds, D-AFS can quantitatively identify respective features' importance towards control. A well-known active flow control problem, cylinder drag reduction, is used for evaluation. Results show that D-AFS successfully finds an optimized five-probes layout with 18.7\% drag reduction than the state-of-the-art solution with 151 probes and 49.2\% reduction than five-probes layout by human experts. We also apply this solution to four OpenAI classical control cases. In all cases, D-AFS achieves the same or better sensor configurations than originally provided solutions. Results highlight, we argued, a new way to achieve efficient and optimal sensor designs for experimental or industrial systems. Our source codes are made publicly available at https://github.com/G-AILab/DAFSFluid.

A Novel Long-term Iterative Mining Scheme for Video Salient Object Detection arxiv:2206.09564 📈 2

Chenglizhao Chen, Hengsen Wang, Yuming Fang, Chong Peng

**Abstract:** The existing state-of-the-art (SOTA) video salient object detection (VSOD) models have widely followed short-term methodology, which dynamically determines the balance between spatial and temporal saliency fusion by solely considering the current consecutive limited frames. However, the short-term methodology has one critical limitation, which conflicts with the real mechanism of our visual system -- a typical long-term methodology. As a result, failure cases keep showing up in the results of the current SOTA models, and the short-term methodology becomes the major technical bottleneck. To solve this problem, this paper proposes a novel VSOD approach, which performs VSOD in a complete long-term way. Our approach converts the sequential VSOD, a sequential task, to a data mining problem, i.e., decomposing the input video sequence to object proposals in advance and then mining salient object proposals as much as possible in an easy-to-hard way. Since all object proposals are simultaneously available, the proposed approach is a complete long-term approach, which can alleviate some difficulties rooted in conventional short-term approaches. In addition, we devised an online updating scheme that can grasp the most representative and trustworthy pattern profile of the salient objects, outputting framewise saliency maps with rich details and smoothing both spatially and temporally. The proposed approach outperforms almost all SOTA models on five widely used benchmark datasets.

Eliminating The Impossible, Whatever Remains Must Be True arxiv:2206.09551 📈 2

Jinqiang Yu, Alexey Ignatiev, Peter J. Stuckey, Nina Narodytska, Joao Marques-Silva

**Abstract:** The rise of AI methods to make predictions and decisions has led to a pressing need for more explainable artificial intelligence (XAI) methods. One common approach for XAI is to produce a post-hoc explanation, explaining why a black box ML model made a certain prediction. Formal approaches to post-hoc explanations provide succinct reasons for why a prediction was made, as well as why not another prediction was made. But these approaches assume that features are independent and uniformly distributed. While this means that "why" explanations are correct, they may be longer than required. It also means the "why not" explanations may be suspect as the counterexamples they rely on may not be meaningful. In this paper, we show how one can apply background knowledge to give more succinct "why" formal explanations, that are presumably easier to interpret by humans, and give more accurate "why not" explanations. Furthermore, we also show how to use existing rule induction techniques to efficiently extract background information from a dataset, and also how to report which background information was used to make an explanation, allowing a human to examine it if they doubt the correctness of the explanation.

A Parallel Implementation of Computing Mean Average Precision arxiv:2206.09504 📈 2

Beinan Wang

**Abstract:** Mean Average Precision (mAP) has been widely used for evaluating the quality of object detectors, but an efficient implementation is still absent. Current implementations can only count true positives (TP's) and false positives (FP's) for one class at a time by looping through every detection of that class sequentially. Not only are these approaches inefficient, but they are also inconvenient for reporting validation mAP during training. We propose a parallelized alternative that can process mini-batches of detected bounding boxes (DTBB's) and ground truth bounding boxes (GTBB's) as inference goes such that mAP can be instantly calculated after inference is finished. Loops and control statements in sequential implementations are replaced with extensive uses of broadcasting, masking, and indexing. All operators involved are supported by popular machine learning frameworks such as PyTorch and TensorFlow. As a result, our implementation is much faster and can easily fit into typical training routines. A PyTorch version of our implementation is available at https://github.com/bwangca/fast-map.

SNN2ANN: A Fast and Memory-Efficient Training Framework for Spiking Neural Networks arxiv:2206.09449 📈 2

Jianxiong Tang, Jianhuang Lai, Xiaohua Xie, Lingxiao Yang, Wei-Shi Zheng

**Abstract:** Spiking neural networks are efficient computation models for low-power environments. Spike-based BP algorithms and ANN-to-SNN (ANN2SNN) conversions are successful techniques for SNN training. Nevertheless, the spike-base BP training is slow and requires large memory costs. Though ANN2NN provides a low-cost way to train SNNs, it requires many inference steps to mimic the well-trained ANN for good performance. In this paper, we propose a SNN-to-ANN (SNN2ANN) framework to train the SNN in a fast and memory-efficient way. The SNN2ANN consists of 2 components: a) a weight sharing architecture between ANN and SNN and b) spiking mapping units. Firstly, the architecture trains the weight-sharing parameters on the ANN branch, resulting in fast training and low memory costs for SNN. Secondly, the spiking mapping units ensure that the activation values of the ANN are the spiking features. As a result, the classification error of the SNN can be optimized by training the ANN branch. Besides, we design an adaptive threshold adjustment (ATA) algorithm to address the noisy spike problem. Experiment results show that our SNN2ANN-based models perform well on the benchmark datasets (CIFAR10, CIFAR100, and Tiny-ImageNet). Moreover, the SNN2ANN can achieve comparable accuracy under 0.625x time steps, 0.377x training time, 0.27x GPU memory costs, and 0.33x spike activities of the Spike-based BP model.

A generalized regionalization framework for geographical modelling and its application in spatial regression arxiv:2206.09429 📈 2

Hao Guo, Andre Python, Yu Liu

**Abstract:** In presence of spatial heterogeneity, models applied to geographic data face a trade-off between producing general results and capturing local variations. Modelling at a regional scale may allow the identification of solutions that optimize both accuracy and generality. However, most current regionalization algorithms assume homogeneity in the attributes to delineate regions without considering the processes that generate the attributes. In this paper, we propose a generalized regionalization framework based on a two-item objective function which favors solutions with the highest overall accuracy while minimizing the number of regions. We introduce three regionalization algorithms, which extend previous methods that account for spatially constrained clustering. The effectiveness of the proposed framework is examined in regression experiments on both simulated and real data. The results show that a spatially implicit algorithm extended with an automatic post-processing procedure outperforms spatially explicit approaches. Our suggested framework contributes to better capturing the processes associated with spatial heterogeneity with potential applications in a wide range of geographical models.

Faster Sampling from Log-Concave Distributions over Polytopes via a Soft-Threshold Dikin Walk arxiv:2206.09384 📈 2

Oren Mangoubi, Nisheeth K. Vishnoi

**Abstract:** We consider the problem of sampling from a $d$-dimensional log-concave distribution $π(θ) \propto e^{-f(θ)}$ constrained to a polytope $K$ defined by $m$ inequalities. Our main result is a "soft-threshold'' variant of the Dikin walk Markov chain that requires at most $O((md + d L^2 R^2) \times md^{ω-1}) \log(\frac{w}δ))$ arithmetic operations to sample from $π$ within error $δ>0$ in the total variation distance from a $w$-warm start, where $L$ is the Lipschitz-constant of $f$, $K$ is contained in a ball of radius $R$ and contains a ball of smaller radius $r$, and $ω$ is the matrix-multiplication constant. When a warm start is not available, it implies an improvement of $\tilde{O}(d^{3.5-ω})$ arithmetic operations on the previous best bound for sampling from $π$ within total variation error $δ$, which was obtained with the hit-and-run algorithm, in the setting where $K$ is a polytope given by $m=O(d)$ inequalities and $LR = O(\sqrt{d})$. When a warm start is available, our algorithm improves by a factor of $d^2$ arithmetic operations on the best previous bound in this setting, which was obtained for a different version of the Dikin walk algorithm. Plugging our Dikin walk Markov chain into the post-processing algorithm of Mangoubi and Vishnoi (2021), we achieve further improvements in the dependence of the running time for the problem of generating samples from $π$ with infinity distance bounds in the special case when $K$ is a polytope.

Graph Neural Network Aided MU-MIMO Detectors arxiv:2206.09381 📈 2

Alva Kosasih, Vincent Onasis, Vera Miloslavskaya, Wibowo Hardjawana, Victor Andrean, Branka Vucetic

**Abstract:** Multi-user multiple-input multiple-output (MU-MIMO) systems can be used to meet high throughput requirements of 5G and beyond networks. A base station serves many users in an uplink MU-MIMO system, leading to a substantial multi-user interference (MUI). Designing a high-performance detector for dealing with a strong MUI is challenging. This paper analyses the performance degradation caused by the posterior distribution approximation used in the state-of-the-art message passing (MP) detectors in the presence of high MUI. We develop a graph neural network based framework to fine-tune the MP detectors' cavity distributions and thus improve the posterior distribution approximation in the MP detectors. We then propose two novel neural network based detectors which rely on the expectation propagation (EP) and Bayesian parallel interference cancellation (BPIC), referred to as the GEPNet and GPICNet detectors, respectively. The GEPNet detector maximizes detection performance, while GPICNet detector balances the performance and complexity. We provide proof of the permutation equivariance property, allowing the detectors to be trained only once, even in the systems with dynamic changes of the number of users. The simulation results show that the proposed GEPNet detector performance approaches maximum likelihood performance in various configurations and GPICNet detector doubles the multiplexing gain of BPIC detector.

mvHOTA: A multi-view higher order tracking accuracy metric to measure spatial and temporal associations in multi-point detection arxiv:2206.09372 📈 2

Lalith Sharan, Halvar Kelm, Gabriele Romano, Matthias Karck, Raffaele De Simone, Sandy Engelhardt

**Abstract:** Multi-object tracking (MOT) is a challenging task that involves detecting objects in the scene and tracking them across a sequence of frames. Evaluating this task is difficult due to temporal occlusions, and varying trajectories across a sequence of images. The main evaluation metric to benchmark MOT methods on datasets such as KITTI has recently become the higher order tracking accuracy (HOTA) metric, which is capable of providing a better description of the performance over metrics such as MOTA, DetA, and IDF1. Point detection and tracking is a closely related task, which could be regarded as a special case of object detection. However, there are differences in evaluating the detection task itself (point distances vs. bounding box overlap). When including the temporal dimension and multi-view scenarios, the evaluation task becomes even more complex. In this work, we propose a multi-view higher order tracking metric (mvHOTA) to determine the accuracy of multi-point (multi-instance and multi-class) detection, while taking into account temporal and spatial associations. mvHOTA can be interpreted as the geometric mean of the detection, association, and correspondence accuracies, thereby providing equal weighting to each of the factors. We demonstrate a use-case through a publicly available endoscopic point detection dataset from a previously organised medical challenge. Furthermore, we compare with other adjusted MOT metrics for this use-case, discuss the properties of mvHOTA, and show how the proposed correspondence accuracy and the Occlusion index facilitate analysis of methods with respect to handling of occlusions. The code will be made publicly available.

Frank-Wolfe-based Algorithms for Approximating Tyler's M-estimator arxiv:2206.09370 📈 2

Lior Danon, Dan Garber

**Abstract:** Tyler's M-estimator is a well known procedure for robust and heavy-tailed covariance estimation. Tyler himself suggested an iterative fixed-point algorithm for computing his estimator however, it requires super-linear (in the size of the data) runtime per iteration, which may be prohibitive in large scale. In this work we propose, to the best of our knowledge, the first Frank-Wolfe-based algorithms for computing Tyler's estimator. One variant uses standard Frank-Wolfe steps, the second also considers \textit{away-steps} (AFW), and the third is a \textit{geodesic} version of AFW (GAFW). AFW provably requires, up to a log factor, only linear time per iteration, while GAFW runs in linear time (up to a log factor) in a large $n$ (number of data-points) regime. All three variants are shown to provably converge to the optimal solution with sublinear rate, under standard assumptions, despite the fact that the underlying optimization problem is not convex nor smooth. Under an additional fairly mild assumption, that holds with probability 1 when the (normalized) data-points are i.i.d. samples from a continuous distribution supported on the entire unit sphere, AFW and GAFW are proved to converge with linear rates. Importantly, all three variants are parameter-free and use adaptive step-sizes.

DASH: Distributed Adaptive Sequencing Heuristic for Submodular Maximization arxiv:2206.09563 📈 1

Tonmoy Dey, Yixin Chen, Alan Kuhnle

**Abstract:** The development of parallelizable algorithms for monotone, submodular maximization subject to cardinality constraint (SMCC) has resulted in two separate research directions: centralized algorithms with low adaptive complexity, which require random access to the entire dataset; and distributed MapReduce (MR) model algorithms, that use a small number of MR rounds of computation. Currently, no MR model algorithm is known to use sublinear number of adaptive rounds which limits their practical performance. We study the SMCC problem in a distributed setting and present three separate MR model algorithms that introduce sublinear adaptivity in a distributed setup. Our primary algorithm, DASH achieves an approximation of $\frac{1}{2}(1-1/e-\varepsilon)$ using one MR round, while its multi-round variant METADASH enables MR model algorithms to be run on large cardinality constraints that were previously not possible. The two additional algorithms, T-DASH and G-DASH provide an improved ratio of ($\frac{3}{8}-\varepsilon$) and ($1-1/e-\varepsilon$) respectively using one and $(1/\varepsilon)$ MR rounds . All our proposed algorithms have sublinear adaptive complexity and we provide extensive empirical evidence to establish: DASH is orders of magnitude faster than the state-of-the-art distributed algorithms while producing nearly identical solution values; and validate the versatility of DASH in obtaining feasible solutions on both centralized and distributed data.

Extracting Fast and Slow: User-Action Embedding with Inter-temporal Information arxiv:2206.09535 📈 1

Akira Matsui, Emilio Ferrara

**Abstract:** With the recent development of technology, data on detailed human temporal behaviors has become available. Many methods have been proposed to mine those human dynamic behavior data and revealed valuable insights for research and businesses. However, most methods analyze only sequence of actions and do not study the inter-temporal information such as the time intervals between actions in a holistic manner. While actions and action time intervals are interdependent, it is challenging to integrate them because they have different natures: time and action. To overcome this challenge, we propose a unified method that analyzes user actions with intertemporal information (time interval). We simultaneously embed the user's action sequence and its time intervals to obtain a low-dimensional representation of the action along with intertemporal information. The paper demonstrates that the proposed method enables us to characterize user actions in terms of temporal context, using three real-world data sets. This paper demonstrates that explicit modeling of action sequences and inter-temporal user behavior information enable successful interpretable analysis.

Hands-on Wireless Sensing with Wi-Fi: A Tutorial arxiv:2206.09532 📈 1

Zheng Yang, Yi Zhang, Guoxuan Chi, Guidong Zhang

**Abstract:** With the rapid development of wireless communication technology, wireless access points (AP) and internet of things (IoT) devices have been widely deployed in our surroundings. Various types of wireless signals (e.g., Wi-Fi, LoRa, LTE) are filling out our living and working spaces. Previous researches reveal the fact that radio waves are modulated by the spatial structure during the propagation process (e.g., reflection, diffraction, and scattering) and superimposed on the receiver. This observation allows us to reconstruct the surrounding environment based on received wireless signals, called "wireless sensing". Wireless sensing is an emerging technology that enables a wide range of applications, such as gesture recognition for human-computer interaction, vital signs monitoring for health care, and intrusion detection for security management. Compared with other sensing paradigms, such as vision-based and IMU-based sensing, wireless sensing solutions have unique advantages such as high coverage, pervasiveness, low cost, and robustness under adverse light and texture scenarios. Besides, wireless sensing solutions are generally lightweight in terms of both computation overhead and device size. This tutorial takes Wi-Fi sensing as an example. It introduces both the theoretical principles and the code implementation of data collection, signal processing, features extraction, and model design. In addition, this tutorial highlights state-of-the-art deep learning models (e.g., CNN, RNN, and adversarial learning models) and their applications in wireless sensing systems. We hope this tutorial will help people in other research fields to break into wireless sensing research and learn more about its theories, designs, and implementation skills, promoting prosperity in the wireless sensing research field.

Temporal Link Prediction via Adjusted Sigmoid Function and 2-Simplex Sructure arxiv:2206.09529 📈 1

Ruizhi Zhang, Qiaozi Wang, Qiming Yang, Wei Wei

**Abstract:** Temporal network link prediction is an important task in the field of network science, and has a wide range of applications in practical scenarios. Revealing the evolutionary mechanism of the network is essential for link prediction, and how to effectively utilize the historical information for temporal links and efficiently extract the high-order patterns of network structure remains a vital challenge. To address these issues, in this paper, we propose a novel temporal link prediction model with adjusted sigmoid function and 2-simplex structure (TLPSS). The adjusted sigmoid decay mode takes the active, decay and stable states of edges into account, which properly fits the life cycle of information. Moreover, the latent matrix sequence is introduced, which is composed of simplex high-order structure, to enhance the performance of link prediction method since it is highly feasible in sparse network. Combining the life cycle of information and simplex high-order structure, the overall performance of TLPSS is achieved by satisfying the consistency of temporal and structural information in dynamic networks. Experimental results on six real-world datasets demonstrate the effectiveness of TLPSS, and our proposed model improves the performance of link prediction by an average of 15% compared to other baseline methods.

The Power of Regularization in Solving Extensive-Form Games arxiv:2206.09495 📈 1

Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang

**Abstract:** In this paper, we investigate the power of regularization, a common technique in reinforcement learning and optimization, in solving extensive-form games (EFGs). We propose a series of new algorithms based on regularizing the payoff functions of the game, and establish a set of convergence results that strictly improve over the existing ones, with either weaker assumptions or stronger convergence guarantees. In particular, we first show that dilated optimistic mirror descent (DOMD), an efficient variant of OMD for solving EFGs, with adaptive regularization can achieve a fast $\tilde O(1/T)$ last-iterate convergence in terms of duality gap without the uniqueness assumption of the Nash equilibrium (NE). Moreover, regularized dilated optimistic multiplicative weights update (Reg-DOMWU), an instance of Reg-DOMD, further enjoys the $\tilde O(1/T)$ last-iterate convergence rate of the distance to the set of NE. This addresses an open question on whether iterate convergence can be obtained for OMWU algorithms without the uniqueness assumption in both the EFG and normal-form game literature. Second, we show that regularized counterfactual regret minimization (Reg-CFR), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(1/T^{1/4})$ best-iterate, and $O(1/T^{3/4})$ average-iterate convergence rate for finding NE in EFGs. Finally, we show that Reg-CFR can achieve asymptotic last-iterate convergence, and optimal $O(1/T)$ average-iterate convergence rate, for finding the NE of perturbed EFGs, which is useful for finding approximate extensive-form perfect equilibria (EFPE). To the best of our knowledge, they constitute the first last-iterate convergence results for CFR-type algorithms, while matching the SOTA average-iterate convergence rate in finding NE for non-perturbed EFGs. We also provide numerical results to corroborate the advantages of our algorithms.

An Analysis of the Admissibility of the Objective Functions Applied in Evolutionary Multi-objective Clustering arxiv:2206.09483 📈 1

Cristina Y. Morimoto, Aurora Pozo, Marcílio C. P. de Souto

**Abstract:** A variety of clustering criteria has been applied as an objective function in Evolutionary Multi-Objective Clustering approaches (EMOCs). However, most EMOCs do not provide detailed analysis regarding the choice and usage of the objective functions. Aiming to support a better choice and definition of the objectives in the EMOCs, this paper proposes an analysis of the admissibility of the clustering criteria in evolutionary optimization by examining the search direction and its potential in finding optimal results. As a result, we demonstrate how the admissibility of the objective functions can influence the optimization. Furthermore, we provide insights regarding the combinations and usage of the clustering criteria in the EMOCs.

Predicting Human Performance in Vertical Hierarchical Menu Selection in Immersive AR Using Hand-gesture and Head-gaze arxiv:2206.09480 📈 1

Majid Pourmemar, Yashas Joshi, Charalambos Poullis

**Abstract:** There are currently limited guidelines on designing user interfaces (UI) for immersive augmented reality (AR) applications. Designers must reflect on their experience designing UI for desktop and mobile applications and conjecture how a UI will influence AR users' performance. In this work, we introduce a predictive model for determining users' performance for a target UI without the subsequent involvement of participants in user studies. The model is trained on participants' responses to objective performance measures such as consumed endurance (CE) and pointing time (PT) using hierarchical drop-down menus. Large variability in the depth and context of the menus is ensured by randomly and dynamically creating the hierarchical drop-down menus and associated user tasks from words contained in the lexical database WordNet. Subjective performance bias is reduced by incorporating the users' non-verbal standard performance WAIS-IV during the model training. The semantic information of the menu is encoded using the Universal Sentence Encoder. We present the results of a user study that demonstrates that the proposed predictive model achieves high accuracy in predicting the CE on hierarchical menus of users with various cognitive abilities. To the best of our knowledge, this is the first work on predicting CE in designing UI for immersive AR applications.

Compression and Data Similarity: Combination of Two Techniques for Communication-Efficient Solving of Distributed Variational Inequalities arxiv:2206.09446 📈 1

Aleksandr Beznosikov, Alexander Gasnikov

**Abstract:** Variational inequalities are an important tool, which includes minimization, saddles, games, fixed-point problems. Modern large-scale and computationally expensive practical applications make distributed methods for solving these problems popular. Meanwhile, most distributed systems have a basic problem - a communication bottleneck. There are various techniques to deal with it. In particular, in this paper we consider a combination of two popular approaches: compression and data similarity. We show that this synergy can be more effective than each of the approaches separately in solving distributed smooth strongly monotonic variational inequalities. Experiments confirm the theoretical conclusions.

$C^*$-algebra Net: A New Approach Generalizing Neural Network Parameters to $C^*$-algebra arxiv:2206.09513 📈 0

Yuka Hashimoto, Zhao Wang, Tomoko Matsui

**Abstract:** We propose a new framework that generalizes the parameters of neural network models to $C^*$-algebra-valued ones. $C^*$-algebra is a generalization of the space of complex numbers. A typical example is the space of continuous functions on a compact space. This generalization enables us to combine multiple models continuously and use tools for functions such as regression and integration. Consequently, we can learn features of data efficiently and adapt the models to problems continuously. We apply our framework to practical problems such as density estimation and few-shot learning and show that our framework enables us to learn features of data even with a limited number of samples. Our new framework highlights the potential possibility of applying the theory of $C^*$-algebra to general neural network models.

Prev: 2022.06.18 Next: 2022.06.20