脑科学与人工智能Arxiv每日论文推送 2023.03.11

您所在的位置：网站首页 › laborious怎么记忆 › 脑科学与人工智能Arxiv每日论文推送 2023.03.11

脑科学与人工智能Arxiv每日论文推送 2023.03.11

2023-03-13 19:20| 来源: 网络整理| 查看: 265

脑科学与人工智能Arxiv每日论文推送 2023.03.11

【1】MOREA：用于三维医学图像多目标可变形注册的GPU加速进化算法

MOREA: a GPU-accelerated Evolutionary Algorithm for Multi-Objective Deformable Registration of 3D Medical Images

作者：Georgios Andreadis et.al

链接：https://arxiv.org/abs/2303.04873

摘要：在需要大变形的情况下，找到一个现实的变形，将一个图像转换为另一个图像，被认为是医学图像分析的一个关键挑战。有一个适当的图像注册方法来实现这一点，可以释放出许多需要在图像之间传递信息的应用。目前，由于许多现有的方法在每次使用前都需要大量的配置工作，或者不能（现实地）捕捉大的变形，从而阻碍了临床应用。最近，一种使用多目标实值基因库优化混合进化算法（MO-RV-GOMEA）和双动态网格转换模型的多目标方法已经显示出前景，暴露了图像注册问题和二维大变形建模的内在权衡。这项工作建立在这一承诺的基础上，并介绍了MOREA：第一个基于进化算法的多目标方法，能够处理大变形的三维图像的可变形登记。MOREA包括一个三维生物力学网状模型，以保证物理上的合理性，并完全由GPU加速。我们在4名宫颈癌患者的腹部CT扫描上将MOREA与两种最先进的方法进行了比较，后两种方法的配置使每个患者获得了最佳结果。在不需要对每个病人进行配置的情况下，MOREA在代表最困难病例的4个病人中的3个上明显优于这些方法。

Finding a realistic deformation that transforms one image into another, in case large deformations are required, is considered a key challenge in medical image analysis. Having a proper image registration approach to achieve this could unleash a number of applications requiring information to be transferred between images. Clinical adoption is currently hampered by many existing methods requiring extensive configuration effort before each use, or not being able to (realistically) capture large deformations. A recent multi-objective approach that uses the Multi-Objective Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (MO-RV-GOMEA) and a dual-dynamic mesh transformation model has shown promise, exposing the trade-offs inherent to image registration problems and modeling large deformations in 2D. This work builds on this promise and introduces MOREA: the first evolutionary algorithm-based multi-objective approach to deformable registration of 3D images capable of tackling large deformations. MOREA includes a 3D biomechanical mesh model for physical plausibility and is fully GPU-accelerated. We compare MOREA to two state-of-the-art approaches on abdominal CT scans of 4 cervical cancer patients, with the latter two approaches configured for the best results per patient. Without requiring per-patient configuration, MOREA significantly outperforms these approaches on 3 of the 4 patients that represent the most difficult cases.

【2】NIFF：通过神经实例特征锻造来减轻泛化少量物体检测中的遗忘现象

NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging

作者：Karim Guirguis et.al

链接：https://arxiv.org/abs/2303.04958

摘要：在关于人工智能的社会影响的广泛对话中，隐私和记忆是两个反复出现的主题。这些问题是由于需要大量的数据来训练深度神经网络而产生的。广义少量物体检测（G-FSOD）是人工智能中的一种学习范式，它的一个承诺是通过利用旧类（即基类）的先验知识来减轻收集我们希望检测的新类的大量训练样本的需要。G-FSOD努力学习这些新类，同时减轻对基类的灾难性遗忘。然而，现有的方法假设基础图像是可以访问的，这个假设在共享和存储数据有问题时并不成立。在这项工作中，我们提出了第一个用于G-FSOD的无数据知识提炼（DFKD）方法，该方法利用基础模型的兴趣区域（RoI）特征的统计数据来伪造实例级特征，而不需要访问基础图像。我们的贡献有三个方面：（1）我们设计了一个独立的轻量级生成器，具有（2）分类头（3）生成和重放不同的实例级基础特征到RoI头，同时对新数据进行微调。这与图像分类中的标准DFKD方法形成鲜明对比，后者将整个网络反转以生成基础图像。此外，我们在新的微调管道中做了精心的设计选择，使模型正规化。我们表明，我们的方法可以极大地减少基础内存需求，同时在具有挑战性的MS-COCO和PASCAL-VOC基准上为G-FSOD设定了新的标准。

Privacy and memory are two recurring themes in a broad conversation about the societal impact of AI. These concerns arise from the need for huge amounts of data to train deep neural networks. A promise of Generalized Few-shot Object Detection (G-FSOD), a learning paradigm in AI, is to alleviate the need for collecting abundant training samples of novel classes we wish to detect by leveraging prior knowledge from old classes (i.e., base classes). G-FSOD strives to learn these novel classes while alleviating catastrophic forgetting of the base classes. However, existing approaches assume that the base images are accessible, an assumption that does not hold when sharing and storing data is problematic. In this work, we propose the first data-free knowledge distillation (DFKD) approach for G-FSOD that leverages the statistics of the region of interest (RoI) features from the base model to forge instance-level features without accessing the base images. Our contribution is three-fold: (1) we design a standalone lightweight generator with (2) class-wise heads (3) to generate and replay diverse instance-level base features to the RoI head while finetuning on the novel data. This stands in contrast to standard DFKD approaches in image classification, which invert the entire network to generate base images. Moreover, we make careful design choices in the novel finetuning pipeline to regularize the model. We show that our approach can dramatically reduce the base memory requirements, all while setting a new standard for G-FSOD on the challenging MS-COCO and PASCAL-VOC benchmarks.

【3】识别图像分类器在罕见子群上的系统性错误

SG-LSTM: Social Group LSTM for Robot Navigation Through Dense Crowds

作者：Jan Hendrik Metzen et.al

链接：https://arxiv.org/abs/2303.05072

摘要：尽管许多图像分类器的平均性能很好，但它们的性能在训练数据中代表性不足的语义一致的数据子群上会大大恶化。这些系统性错误既会影响到人口学上的少数群体的公平性，也会影响到领域转移下的稳健性和安全性。一个主要的挑战是，当子群没有被注释，而且它们的出现非常罕见时，如何识别这种性能欠佳的子群。我们利用文本到图像模型的最新进展，在子群的文本描述空间（"提示"）中搜索目标模型在提示条件的合成数据上性能低下的子群。为了解决指数级增长的子群数量，我们采用组合测试。我们把这个过程称为PromptAttack，因为它可以被解释为提示空间中的对抗性攻击。我们在一个受控环境中用PromptAttack研究子群覆盖率和可识别性，发现它能以高精确度识别系统性错误。因此，我们将PromptAttack应用于ImageNet分类器，并在罕见的子群上识别出新的系统性错误。

Despite excellent average-case performance of many image classifiers, their performance can substantially deteriorate on semantically coherent subgroups of the data that were under-represented in the training data. These systematic errors can impact both fairness for demographic minority groups as well as robustness and safety under domain shift. A major challenge is to identify such subgroups with subpar performance when the subgroups are not annotated and their occurrence is very rare. We leverage recent advances in text-to-image models and search in the space of textual descriptions of subgroups ("prompts") for subgroups where the target model has low performance on the prompt-conditioned synthesized data. To tackle the exponentially growing number of subgroups, we employ combinatorial testing. We denote this procedure as PromptAttack as it can be interpreted as an adversarial attack in a prompt space. We study subgroup coverage and identifiability with PromptAttack in a controlled setting and find that it identifies systematic errors with high accuracy. Thereupon, we apply PromptAttack to ImageNet classifiers and identify novel systematic errors on rare subgroups.

【4】SLCA：慢速学习者与分类器对齐，在预先训练好的模型上持续学习

SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model

作者：Gengwei Zhang et.al

链接：https://arxiv.org/abs/2303.05118

摘要：持续学习的目标是提高识别模型在学习依次到达的数据时的性能。尽管大多数现有的工作是建立在从头开始学习的前提下，但越来越多的人致力于纳入预训练的好处。然而，如何为每个增量任务自适应地利用预训练的知识，同时保持其通用性，仍然是一个开放的问题。在这项工作中，我们对预训练模型的持续学习（CLPM）进行了广泛的分析，并将关键挑战归结为渐进式过拟合问题。观察到有选择地降低学习率几乎可以解决表示层的这个问题，我们提出了一个简单但极其有效的方法，命名为慢速学习者与分类器对齐（SLCA），它通过对类的分布进行建模并以事后的方式对齐分类层来进一步改善分类层。在各种情况下，我们的建议为CLPM提供了实质性的改进（例如，在Split CIFAR-100、Split ImageNet-R、Split CUB-200和Split Cars-196上分别达到49.76%、50.05%、44.69%和40.16%），因此以很大的幅度超越了最先进的方法。基于这样一个强大的基线，我们深入分析了关键因素和有前途的方向，以促进后续研究。

The goal of continual learning is to improve the performance of recognition models in learning sequentially arrived data. Although most existing works are established on the premise of learning from scratch, growing efforts have been devoted to incorporating the benefits of pre-training. However, how to adaptively exploit the pre-trained knowledge for each incremental task while maintaining its generalizability remains an open question. In this work, we present an extensive analysis for continual learning on a pre-trained model (CLPM), and attribute the key challenge to a progressive overfitting problem. Observing that selectively reducing the learning rate can almost resolve this issue in the representation layer, we propose a simple but extremely effective approach named Slow Learner with Classifier Alignment (SLCA), which further improves the classification layer by modeling the class-wise distributions and aligning the classification layers in a post-hoc fashion. Across a variety of scenarios, our proposal provides substantial improvements for CLPM (e.g., up to 49.76%, 50.05%, 44.69% and 40.16% on Split CIFAR-100, Split ImageNet-R, Split CUB-200 and Split Cars-196, respectively), and thus outperforms state-of-the-art approaches by a large margin. Based on such a strong baseline, critical factors and promising directions are analyzed in-depth to facilitate subsequent research.

【5】组织病理学中的分类：用于多种分类任务的独特深度嵌入提取器

Classification in Histopathology: A unique deep embeddings extractor for multiple classification tasks

作者：Adrien Nivaggioli et.al

链接：https://arxiv.org/abs/2303.05180

摘要：在生物医学成像领域，基于深度学习的方法在每一种模式（虚拟幻灯片、核磁共振等）中都是最先进的。在组织病理学中，这些方法可用于检测某些生物标志物或对病变进行分类。然而，这类技术需要大量的数据来训练高性能的模型，而这些数据本身就很难获得，尤其是在涉及到稀缺的生物标志物时。为了应对这一挑战，我们使用单一的、预先训练好的深度嵌入提取器将图像转换为深度特征，并为每个分类任务在这些嵌入上训练小型的、专门的分类头。这种方法有几个好处，如能够为各种任务重复使用单一的预训练的深度网络；减少所需的标记数据量，因为分类头的参数较少；以及将训练时间加快1000倍，这使得分类头的调整量大增。在这项工作中，我们对各种开源骨干进行了广泛的比较，并评估它们对目标组织学图像领域的适应性。这是用一种基于代理分类任务的新方法实现的。我们证明，由于这种选择方法，可以为目标域的不同任务选择一个最佳的特征提取器。我们还介绍了一种特征空间增强策略，该策略被证明能够极大地改善为不同任务计算的最终指标。为了证明这种骨干选择和特征空间增强的好处，我们在三个独立的分类任务上进行了实验，结果显示每个任务都有明显的改善：微小钙化（F1得分增加29.1%）、淋巴结转移（F1得分增加12.5%）、有丝分裂（F1得分增加15.0%）。

In biomedical imaging, deep learning-based methods are state-of-the-art for every modality (virtual slides, MRI, etc.) In histopathology, these methods can be used to detect certain biomarkers or classify lesions. However, such techniques require large amounts of data to train high-performing models which can be intrinsically difficult to acquire, especially when it comes to scarce biomarkers. To address this challenge, we use a single, pre-trained, deep embeddings extractor to convert images into deep features and train small, dedicated classification head on these embeddings for each classification task. This approach offers several benefits such as the ability to reuse a single pre-trained deep network for various tasks; reducing the amount of labeled data needed as classification heads have fewer parameters; and accelerating training time by up to 1000 times, which allows for much more tuning of the classification head. In this work, we perform an extensive comparison of various open-source backbones and assess their fit to the target histological image domain. This is achieved using a novel method based on a proxy classification task. We demonstrate that thanks to this selection method, an optimal feature extractor can be selected for different tasks on the target domain. We also introduce a feature space augmentation strategy which proves to substantially improve the final metrics computed for the different tasks considered. To demonstrate the benefit of such backbone selection and feature-space augmentation, our experiments are carried out on three separate classification tasks and show a clear improvement on each of them: microcalcifications (29.1% F1-score increase), lymph nodes metastasis (12.5% F1-score increase), mitosis (15.0% F1-score increase).

【6】大型语言模型能否构建因果图？

Can large language models build causal graphs?

作者：Stephanie Long et.al

链接：https://arxiv.org/abs/2303.05279

摘要：建立因果图可能是一个费力的过程。为了确保所有相关的因果途径都被捕捉到，研究人员往往要与临床医生和专家讨论，同时还要查阅大量的相关医学文献。通过对常见的医学知识进行编码，大型语言模型（LLMs）代表了一个机会，可以通过对潜在图中的边缘（即两个变量之间的联系）进行自动评分来缓解这一过程。然而，大型语言模型已被证明对用户所使用的探测词、上下文和提示的选择是很脆弱的。在这项工作中，我们评估了LLMs是否可以成为补充因果图开发的有用工具。

Building causal graphs can be a laborious process. To ensure all relevant causal pathways have been captured, researchers often have to discuss with clinicians and experts while also reviewing extensive relevant medical literature. By encoding common and medical knowledge, large language models (LLMs) represent an opportunity to ease this process by automatically scoring edges (i.e., connections between two variables) in potential graphs. LLMs however have been shown to be brittle to the choice of probing words, context, and prompts that the user employs. In this work, we evaluate if LLMs can be a useful tool in complementing causal graph development.

【7】知识增强的少许视觉关系检测

Knowledge-augmented Few-shot Visual Relation Detection

作者：Tianyu Yu et.al

链接：https://arxiv.org/abs/2303.05342

摘要：视觉关系检测（VRD）的目的是检测物体之间的关系，以便理解图像。大多数现有的VRD方法都依赖于每个关系的数千个训练样本来达到令人满意的性能。最近的一些论文通过精心设计的流水线和预训练的词向量进行少量学习来解决这个问题。然而，现有的几张照片的VRD模型的性能由于泛化能力差而受到严重影响，因为它们很难处理视觉关系的巨大语义多样性。然而，人类有能力在他们的知识基础上用少数例子学习新的关系。受此启发，我们设计了一个以知识为基础的、少量的VRD框架，利用文本知识和视觉关系知识来提高少量VRD的概括能力。文本知识和视觉关系知识分别从预先训练的语言模型和自动构建的视觉关系知识图中获得。我们广泛地验证了我们框架的有效性。在常用的视觉基因组数据集的三个基准上进行的实验表明，我们的性能超过了现有的最先进的模型，并有很大的改进。

Visual Relation Detection (VRD) aims to detect relationships between objects for image understanding. Most existing VRD methods rely on thousands of training samples of each relationship to achieve satisfactory performance. Some recent papers tackle this problem by few-shot learning with elaborately designed pipelines and pre-trained word vectors. However, the performance of existing few-shot VRD models is severely hampered by the poor generalization capability, as they struggle to handle the vast semantic diversity of visual relationships. Nonetheless, humans have the ability to learn new relationships with just few examples based on their knowledge. Inspired by this, we devise a knowledge-augmented, few-shot VRD framework leveraging both textual knowledge and visual relation knowledge to improve the generalization ability of few-shot VRD. The textual knowledge and visual relation knowledge are acquired from a pre-trained language model and an automatically constructed visual relation knowledge graph, respectively. We extensively validate the effectiveness of our framework. Experiments conducted on three benchmarks from the commonly used Visual Genome dataset show that our performance surpasses existing state-of-the-art models with a large improvement.

【8】TOLD：一个新的两阶段重叠意识的说话人日记框架

TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization

作者：Jiaming Wang et.al

链接：https://arxiv.org/abs/2303.05397

摘要：最近，端到端神经日记（EEND）被引入，并在说话人重叠的情况下取得了很好的效果。在EEND中，说话人日记被表述为一个多标签预测问题，其中说话人的活动是独立估计的，他们的依赖性没有得到很好的考虑。为了克服这些缺点，我们采用幂集编码将说话人日记重新表述为一个单标签分类问题，并提出了重叠感知的EEND（EEND-OLA）模型，其中说话人的重叠和依赖性可以被明确地建模。受两阶段混合系统的成功启发，我们进一步提出了一个新颖的两阶段重叠感知日记框架（TOLD），通过涉及一个说话人重叠感知的后处理（SOAP）模型来迭代完善EEND-OLA的日记结果。实验结果表明，与原来的EEND相比，所提出的EEND-OLA在diarization错误率（DER）方面实现了14.39%的相对改善，而利用SOAP则提供了另外19.33%的相对改善。因此，我们的方法TOLD在CALLHOME数据集上实现了10.14%的误码率，就我们所知，这是该基准的一个新的最先进的结果。

Recently, end-to-end neural diarization (EEND) is introduced and achieves promising results in speaker-overlapped scenarios. In EEND, speaker diarization is formulated as a multi-label prediction problem, where speaker activities are estimated independently and their dependency are not well considered. To overcome these disadvantages, we employ the power set encoding to reformulate speaker diarization as a single-label classification problem and propose the overlap-aware EEND (EEND-OLA) model, in which speaker overlaps and dependency can be modeled explicitly. Inspired by the success of two-stage hybrid systems, we further propose a novel Two-stage OverLap-aware Diarization framework (TOLD) by involving a speaker overlap-aware post-processing (SOAP) model to iteratively refine the diarization results of EEND-OLA. Experimental results show that, compared with the original EEND, the proposed EEND-OLA achieves a 14.39% relative improvement in terms of diarization error rates (DER), and utilizing SOAP provides another 19.33% relative improvement. As a result, our method TOLD achieves a DER of 10.14% on the CALLHOME dataset, which is a new state-of-the-art result on this benchmark to the best of our knowledge.

【9】FaceXHuBERT：使用自我监督的语音表征学习的无文本语音驱动的E（X）pressive 3D面部动画合成

FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning

作者：Kazi Injamamul Haque et.al

链接：https://arxiv.org/abs/2303.05416

摘要：本文介绍了FaceXHuBERT，这是一种无文本语音驱动的三维面部动画生成方法，可以捕捉语音中个性化的微妙线索（如身份、情绪和犹豫）。它对背景噪音也非常稳健，可以处理在各种情况下录制的音频（例如多人说话）。最近的方法采用端到端的深度学习，考虑到音频和文本的输入，为整个面部生成面部动画。然而，公开可用的具有表现力的音频-3D面部动画数据集的匮乏构成了一个主要瓶颈。由此产生的动画仍然存在准确的唇语同步、表现力、特定人物信息和可推广性等问题。我们在训练过程中有效地采用了自我监督的预训练HuBERT模型，使我们能够在不使用大型词库的情况下将词汇和非词汇信息纳入音频中。此外，用二元情绪条件和说话人身份指导训练，可以区分最细微的面部运动。我们进行了广泛的客观和主观评价，与基础事实和最先进的工作相比较。一项用户感知研究表明，与最先进的方法相比，我们的方法在动画的真实性方面产生了78%的优越结果。此外，我们的方法是4倍的速度，消除了使用复杂的顺序模型，如变压器。我们强烈建议在阅读论文之前观看补充视频。我们还提供了带有GitHub仓库链接的实现和评估代码。

This paper presents FaceXHuBERT, a text-less speech-driven 3D facial animation generation method that allows to capture personalized and subtle cues in speech (e.g. identity, emotion and hesitation). It is also very robust to background noise and can handle audio recorded in a variety of situations (e.g. multiple people speaking). Recent approaches employ end-to-end deep learning taking into account both audio and text as input to generate facial animation for the whole face. However, scarcity of publicly available expressive audio-3D facial animation datasets poses a major bottleneck. The resulting animations still have issues regarding accurate lip-synching, expressivity, person-specific information and generalizability. We effectively employ self-supervised pretrained HuBERT model in the training process that allows us to incorporate both lexical and non-lexical information in the audio without using a large lexicon. Additionally, guiding the training with a binary emotion condition and speaker identity distinguishes the tiniest subtle facial motion. We carried out extensive objective and subjective evaluation in comparison to ground-truth and state-of-the-art work. A perceptual user study demonstrates that our approach produces superior results with respect to the realism of the animation 78% of the time in comparison to the state-of-the-art. In addition, our method is 4 times faster eliminating the use of complex sequential models such as transformers. We strongly recommend watching the supplementary video before reading the paper. We also provide the implementation and evaluation codes with a GitHub repository link.

【10】论超图神经网络的可表达性和通用性

On the Expressiveness and Generalization of Hypergraph Neural Networks

作者：Zhezheng Luo et.al

链接：https://arxiv.org/abs/2303.05490

摘要：本扩展摘要描述了一个分析超图神经网络（HyperGNNs）的表现力、学习和（结构）泛化的框架。具体来说，我们关注的是超图神经网络如何从有限的数据集中学习并在结构上泛化到任意输入规模的图推理问题。我们的第一个贡献是对HyperGNNs的表现力进行了精细的分析，也就是说，它们能够实现的功能集合。我们的结果是它们能够解决的问题的层次结构，这些问题是根据各种超参数（如深度和边数）定义的。接下来，我们分析了这些神经网络的学习特性，特别是关注它们如何在有限的小图集上进行训练并泛化到更大的图上，我们称之为结构泛化。我们的理论结果得到了实证结果的进一步支持。

This extended abstract describes a framework for analyzing the expressiveness, learning, and (structural) generalization of hypergraph neural networks (HyperGNNs). Specifically, we focus on how HyperGNNs can learn from finite datasets and generalize structurally to graph reasoning problems of arbitrary input sizes. Our first contribution is a fine-grained analysis of the expressiveness of HyperGNNs, that is, the set of functions that they can realize. Our result is a hierarchy of problems they can solve, defined in terms of various hyperparameters such as depths and edge arities. Next, we analyze the learning properties of these neural networks, especially focusing on how they can be trained on a finite set of small graphs and generalize to larger graphs, which we term structural generalization. Our theoretical results are further supported by the empirical results.

【11】TANGOS：通过梯度正交和专业化使表格神经网络正规化

TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization

作者：Alan Jeffares et.al

链接：https://arxiv.org/abs/2303.05506

摘要：尽管深度神经网络在非结构化数据方面取得了成功，但它还不是结构化表格数据的万能药。在表格领域，它们的效率主要依赖于各种形式的正则化，以防止过度拟合并提供强大的泛化性能。现有的正则化技术包括广泛的建模决策，如架构的选择、损失函数和优化方法。在这项工作中，我们介绍了表格神经梯度正交化和专业化（TANGOS），这是一个建立在潜在单元归属上的表格设置中的新型正则化框架。激活的梯度归属相对于一个给定的输入特征，表明神经元如何关注该特征，并经常被用来解释深度网络的预测。在TANGOS中，我们采取不同的方法，将神经元的属性直接纳入训练，以鼓励全连接网络中潜在属性的正交化和专业化。我们的正则器鼓励神经元关注稀疏的、不重叠的输入特征，从而形成一组多样化和专业化的潜在单元。在表格领域，我们证明了我们的方法可以导致改善样本外泛化性能，优于其他流行的正则化方法。我们深入分析了我们的正则化方法的有效性，并证明TANGOS可以与现有的方法联合应用，以实现更大的泛化性能。

Despite their success with unstructured data, deep neural networks are not yet a panacea for structured tabular data. In the tabular domain, their efficiency crucially relies on various forms of regularization to prevent overfitting and provide strong generalization performance. Existing regularization techniques include broad modelling decisions such as choice of architecture, loss functions, and optimization methods. In this work, we introduce Tabular Neural Gradient Orthogonalization and Specialization (TANGOS), a novel framework for regularization in the tabular setting built on latent unit attributions. The gradient attribution of an activation with respect to a given input feature suggests how the neuron attends to that feature, and is often employed to interpret the predictions of deep networks. In TANGOS, we take a different approach and incorporate neuron attributions directly into training to encourage orthogonalization and specialization of latent attributions in a fully-connected network. Our regularizer encourages neurons to focus on sparse, non-overlapping input features and results in a set of diverse and specialized latent units. In the tabular domain, we demonstrate that our approach can lead to improved out-of-sample generalization performance, outperforming other popular regularization methods. We provide insight into why our regularizer is effective and demonstrate that TANGOS can be applied jointly with existing methods to achieve even greater generalization performance.

【12】利用语境结构生成有用的辅助任务

Exploiting Contextual Structure to Generate Useful Auxiliary Tasks

作者：Benedict Quartey et.al

链接：https://arxiv.org/abs/2303.05038

摘要：强化学习需要与环境互动，这对机器人来说是昂贵的。这种限制使得有必要通过最大限度地重复使用以前的经验，在有限的环境互动中工作。我们提出了一种方法，通过生成和同时学习有用的辅助任务，在学习解决一个给定的任务时，最大限度地重复使用经验。为了生成这些任务，我们构建了一个给定任务的抽象时间逻辑表示，并利用大型语言模型来生成上下文感知的对象嵌入，以促进对象替换。反事实推理和非政策方法使我们能够在解决给定的目标任务的同时学习这些辅助性任务。我们将这些见解结合到一个新的多任务强化学习框架中，并通过实验表明，我们生成的辅助任务与给定任务有着相似的基本探索要求，从而使定向探索的效用最大化。我们的方法允许代理自动学习额外的有用政策，而不需要额外的环境互动。

Reinforcement learning requires interaction with an environment, which is expensive for robots. This constraint necessitates approaches that work with limited environmental interaction by maximizing the reuse of previous experiences. We propose an approach that maximizes experience reuse while learning to solve a given task by generating and simultaneously learning useful auxiliary tasks. To generate these tasks, we construct an abstract temporal logic representation of the given task and leverage large language models to generate context-aware object embeddings that facilitate object replacements. Counterfactual reasoning and off-policy methods allow us to simultaneously learn these auxiliary tasks while solving the given target task. We combine these insights into a novel framework for multitask reinforcement learning and experimentally show that our generated auxiliary tasks share similar underlying exploration requirements as the given task, thereby maximizing the utility of directed exploration. Our approach allows agents to automatically learn additional useful policies without extra environment interaction.

【13】PDSketch：综合规划领域编程和学习

PDSketch: Integrated Planning Domain Programming and Learning

作者：Jiayuan Mao et.al

链接：https://arxiv.org/abs/2303.05501

摘要：本文研究了一种模型学习和在线规划方法，以建立灵活和通用的机器人。具体来说，我们研究如何利用底层环境过渡模型中的局部性和稀疏性结构来提高模型的通用性、数据效率和运行时间效率。我们提出了一种新的领域定义语言，名为PDSketch。它允许用户在过渡模型中灵活地定义高层结构，如对象和特征依赖，其方式类似于程序员使用TensorFlow或PyTorch来指定卷积神经网络的内核大小和隐藏维度。过渡模型的细节将由可训练的神经网络来填补。基于定义的结构和学习的参数，PDSketch自动生成与领域无关的规划启发式方法，无需额外的训练。得出的启发式方法加速了新目标的性能时间规划。

This paper studies a model learning and online planning approach towards building flexible and general robots. Specifically, we investigate how to exploit the locality and sparsity structures in the underlying environmental transition model to improve model generalization, data-efficiency, and runtime-efficiency. We present a new domain definition language, named PDSketch. It allows users to flexibly define high-level structures in the transition models, such as object and feature dependencies, in a way similar to how programmers use TensorFlow or PyTorch to specify kernel sizes and hidden dimensions of a convolutional neural network. The details of the transition model will be filled in by trainable neural networks. Based on the defined structures and learned parameters, PDSketch automatically generates domain-independent planning heuristics without additional training. The derived heuristics accelerate the performance-time planning for novel goals.

【本文地址】

脑科学与人工智能Arxiv每日论文推送 2023.03.11

脑科学与人工智能Arxiv每日论文推送 2023.03.11

今日新闻

推荐新闻