Adam's Arxiv FrontPage


Generated on 2024-09-28.


This frontpage is made by scraping arxiv and by runnig a sentence-model that detects if the abstract describes a paper about a topic of interest. One cool feature: it all pretty much runs via Github Actions.


New Datasets

2024-09-26

Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE

Multi-modal large language models (MLLMs) have shown impressive capabilities as a general-purpose interface for various visual and linguistic tasks.However, building a unified MLLM for multi-task learning in the medical field remains a thorny challenge.To mitigate the tug-of-war problem of multi-modal multi-task optimization, recent advances primarily focus on improving the LLM components, while neglecting the connector that bridges the gap between modalities.In this paper, we introduce Uni-Med, a novel medical generalist foundation model which consists of a universal visual feature extraction module, a connector mixture-of-experts (CMoE) module, and an LLM.Benefiting from the proposed CMoE that leverages a well-designed router with a mixture of projection experts at the connector, Uni-Med achieves efficient solution to the tug-of-war problem and can perform six different medical tasks including question answering, visual question answering, report generation, referring expression comprehension, referring expression generation and image classification.To the best of our knowledge, Uni-Med is the first effort to tackle multi-task interference at the connector.Extensive ablation experiments validate the effectiveness of introducing CMoE under any configuration, with up to an average 8% performance gains.We further provide interpretation analysis of the tug-of-war problem from the perspective of gradient optimization and parameter statistics.Compared to previous state-of-the-art medical MLLMs, Uni-Med achieves competitive or superior evaluation metrics on diverse tasks.Code, data and model will be soon available at GitHub. 0.74

link

2024-09-26

JoyType: A Robust Design for Multilingual Visual Text Creation

Generating images with accurately represented text, especially in non-Latin languages, poses a significant challenge for diffusion models.Existing approaches, such as the integration of hint condition diagrams via auxiliary networks (e.g., ControlNet), have made strides towards addressing this issue.However, diffusion models often fall short in tasks requiring controlled text generation, such as specifying particular fonts or producing text in small fonts.In this paper, we introduce a novel approach for multilingual visual text creation, named JoyType, designed to maintain the font style of text during the image generation process.Our methodology begins with assembling a training dataset, JoyType-1M, comprising 1 million pairs of data. 0.904Each pair includes an image, its description, and glyph instructions corresponding to the font style within the image.We then developed a text control network, Font ControlNet, tasked with extracting font style information to steer the image generation.To further enhance our model's ability to maintain font style, notably in generating small-font text, we incorporated a multi-layer OCR-aware loss into the diffusion process.This enhancement allows JoyType to direct text rendering using low-level descriptors.Our evaluations, based on both visual and accuracy metrics, demonstrate that JoyType significantly outperforms existing state-of-the-art methods.Additionally, JoyType can function as a plugin, facilitating the creation of varied image styles in conjunction with other stable diffusion models on HuggingFace and CivitAI.Our project is open-sourced on https://jdh-algo.github.io/JoyType/.

link

2024-09-26

Expanding Perspectives on Data Privacy: Insights from Rural Togo

Passively collected "big" data sources are increasingly used to inform critical development policy decisions in low- and middle-income countries. 0.749While prior work highlights how such approaches may reveal sensitive information, enable surveillance, and centralize power, less is known about the corresponding privacy concerns, hopes, and fears of the people directly impacted by these policies -- people sometimes referred to as experiential experts.To understand the perspectives of experiential experts, we conducted semi-structured interviews with people living in rural villages in Togo shortly after an entirely digital cash transfer program was launched that used machine learning and mobile phone metadata to determine program eligibility.This paper documents participants' privacy concerns surrounding the introduction of big data approaches in development policy.We find that the privacy concerns of our experiential experts differ from those raised by privacy and development domain experts.To facilitate a more robust and constructive account of privacy, we discuss implications for policies and designs that take seriously the privacy concerns raised by both experiential experts and domain experts.

link

2024-09-26

Enhancing Structured-Data Retrieval with GraphRAG: Soccer Data Case Study

Extracting meaningful insights from large and complex datasets poses significant challenges, particularly in ensuring the accuracy and relevance of retrieved information.Traditional data retrieval methods such as sequential search and index-based retrieval often fail when handling intricate and interconnected data structures, resulting in incomplete or misleading outputs.To overcome these limitations, we introduce Structured-GraphRAG, a versatile framework designed to enhance information retrieval across structured datasets in natural language queries.Structured-GraphRAG utilizes multiple knowledge graphs, which represent data in a structured format and capture complex relationships between entities, enabling a more nuanced and comprehensive retrieval of information.This graph-based approach reduces the risk of errors in language model outputs by grounding responses in a structured format, thereby enhancing the reliability of results.We demonstrate the effectiveness of Structured-GraphRAG by comparing its performance with that of a recently published method using traditional retrieval-augmented generation.Our findings show that Structured-GraphRAG significantly improves query processing efficiency and reduces response times.While our case study focuses on soccer data, the framework's design is broadly applicable, offering a powerful tool for data analysis and enhancing language model applications across various structured domains. 0.796

link

2024-09-26

Multimodal Banking Dataset: Understanding Client Needs through Event Sequences

Financial organizations collect a huge amount of data about clients that typically has a temporal (sequential) structure and is collected from various sources (modalities).Due to privacy issues, there are no large-scale open-source multimodal datasets of event sequences, which significantly limits the research in this area. 0.853In this paper, we present the industrial-scale publicly available multimodal banking dataset, MBD, that contains more than 1.5M corporate clients with several modalities: 950M bank transactions, 1B geo position events, 5M embeddings of dialogues with technical support and monthly aggregated purchases of four bank's products. 0.892All entries are properly anonymized from real proprietary bank data.Using this dataset, we introduce a novel benchmark with two business tasks: campaigning (purchase prediction in the next month) and matching of clients.We provide numerical results that demonstrate the superiority of our multi-modal baselines over single-modal techniques for each task.As a result, the proposed dataset can open new perspectives and facilitate the future development of practically important large-scale multimodal algorithms for event sequences. 0.858HuggingFace Link: https://huggingface.co/datasets/ai-lab/MBD Github Link: https://github.com/Dzhambo/MBD

link

2024-09-26

Preserving logical and functional dependencies in synthetic tabular data

Dependencies among attributes are a common aspect of tabular data.However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored.In addition to the existing notion of functional dependencies, we introduce the notion of logical dependencies among the attributes in this article.Moreover, we provide a measure to quantify logical dependencies among attributes in tabular data.Utilizing this measure, we compare several state-of-the-art synthetic data generation algorithms and test their capability to preserve logical and functional dependencies on several publicly available datasets. 0.746We demonstrate that currently available synthetic tabular data generation algorithms do not fully preserve functional dependencies when they generate synthetic datasets.In addition, we also showed that some tabular synthetic data generation models can preserve inter-attribute logical dependencies.Our review and comparison of the state-of-the-art reveal research needs and opportunities to develop task-specific synthetic tabular data generation models.

link

2024-09-26

BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented text

Many of the recent breakthroughs in language modeling have resulted from scaling effectively the same model architecture to larger datasets.In this vein, recent work has highlighted performance gains from increasing training dataset size and quality, suggesting a need for novel sources of large-scale datasets.In this work, we introduce BeanCounter, a public dataset consisting of more than 159B tokens extracted from businesses' disclosures. 0.857We show that this data is indeed novel: less than 0.1% of BeanCounter appears in Common Crawl-based datasets and it is an order of magnitude larger than datasets relying on similar sources.Given the data's provenance, we hypothesize that BeanCounter is comparatively more factual and less toxic than web-based datasets.Exploring this hypothesis, we find that many demographic identities occur with similar prevalence in BeanCounter but with significantly less toxic context relative to other datasets.To demonstrate the utility of BeanCounter, we evaluate and compare two LLMs continually pre-trained on BeanCounter with their base models.We find an 18-33% reduction in toxic generation and improved performance within the finance domain for the continually pretrained models.Collectively, our work suggests that BeanCounter is a novel source of low-toxicity and high-quality domain-specific data with sufficient scale to train multi-billion parameter LLMs.

link

2024-09-26

Implementing a Nordic-Baltic Federated Health Data Network: a case report

Background: Centralized collection and processing of healthcare data across national borders pose significant challenges, including privacy concerns, data heterogeneity and legal barriers.To address some of these challenges, we formed an interdisciplinary consortium to develop a feder-ated health data network, comprised of six institutions across five countries, to facilitate Nordic-Baltic cooperation on secondary use of health data. 0.772The objective of this report is to offer early insights into our experiences developing this network.Methods: We used a mixed-method ap-proach, combining both experimental design and implementation science to evaluate the factors affecting the implementation of our network.Results:Technically, our experiments indicate that the network functions without significant performance degradation compared to centralized simu-lation.Conclusion: While use of interdisciplinary approaches holds a potential to solve challeng-es associated with establishing such collaborative networks, our findings turn the spotlight on the uncertain regulatory landscape playing catch up and the significant operational costs.

link

2024-09-26

Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect

We introduce Atlas-Chat, the first-ever collection of large language models specifically developed for dialectal Arabic.Focusing on Moroccan Arabic, also known as Darija, we construct our instruction dataset by consolidating existing Darija language resources, creating novel datasets both manually and synthetically, and translating English instructions with stringent quality control. 0.78Atlas-Chat-9B and 2B models, fine-tuned on the dataset, exhibit superior ability in following Darija instructions and performing standard NLP tasks.Notably, our models outperform both state-of-the-art and Arabic-specialized LLMs like LLaMa, Jais, and AceGPT, e.g., achieving a 13% performance boost over a larger 13B model on DarijaMMLU, in our newly introduced evaluation suite for Darija covering both discriminative and generative tasks.Furthermore, we perform an experimental analysis of various fine-tuning strategies and base model choices to determine optimal configurations.All our resources are publicly accessible, and we believe our work offers comprehensive design methodologies of instruction-tuning for low-resource language variants, which are often neglected in favor of data-rich languages by contemporary LLMs.

link

2024-09-26

Role-RL: Online Long-Context Processing with Role Reinforcement Learning for Distinct LLMs in Their Optimal Roles

Large language models (LLMs) with long-context processing are still challenging because of their implementation complexity, training efficiency and data sparsity.To address this issue, a new paradigm named Online Long-context Processing (OLP) is proposed when we process a document of unlimited length, which typically occurs in the information reception and organization of diverse streaming media such as automated news reporting, live e-commerce, and viral short videos.Moreover, a dilemma was often encountered when we tried to select the most suitable LLM from a large number of LLMs amidst explosive growth aiming for outstanding performance, affordable prices, and short response delays.In view of this, we also develop Role Reinforcement Learning (Role-RL) to automatically deploy different LLMs in their respective roles within the OLP pipeline according to their actual performance.Extensive experiments are conducted on our OLP-MINI dataset and it is found that OLP with Role-RL framework achieves OLP benchmark with an average recall rate of 93.2% and the LLM cost saved by 79.4%.The code and dataset are publicly available at: https://anonymous.4open.science/r/Role-RL. 0.912

link

2024-09-26

Visual Data Diagnosis and Debiasing with Concept Graphs

The widespread success of deep learning models today is owed to the curation of extensive datasets significant in size and complexity.However, such models frequently pick up inherent biases in the data during the training process, leading to unreliable predictions.Diagnosing and debiasing datasets is thus a necessity to ensure reliable model performance.In this paper, we present CONBIAS, a novel framework for diagnosing and mitigating Concept co-occurrence Biases in visual datasets.CONBIAS represents visual datasets as knowledge graphs of concepts, enabling meticulous analysis of spurious concept co-occurrences to uncover concept imbalances across the whole dataset.Moreover, we show that by employing a novel clique-based concept balancing strategy, we can mitigate these imbalances, leading to enhanced performance on downstream tasks.Extensive experiments show that data augmentation based on a balanced concept distribution augmented by CONBIAS improves generalization performance across multiple datasets compared to state-of-the-art methods.We will make our code and data publicly available. 0.777

link

2024-09-26

SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation

Automating garment manipulation poses a significant challenge for assistive robotics due to the diverse and deformable nature of garments.Traditional approaches typically require separate models for each garment type, which limits scalability and adaptability.In contrast, this paper presents a unified approach using vision-language models (VLMs) to improve keypoint prediction across various garment categories.By interpreting both visual and semantic information, our model enables robots to manage different garment states with a single model.We created a large-scale synthetic dataset using advanced simulation techniques, allowing scalable training without extensive real-world data. 0.856Experimental results indicate that the VLM-based method significantly enhances keypoint detection accuracy and task success rates, providing a more flexible and general solution for robotic garment manipulation.In addition, this research also underscores the potential of VLMs to unify various garment manipulation tasks within a single framework, paving the way for broader applications in home automation and assistive robotics for future.

link

2024-09-26

E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding

Recent advances in Video Large Language Models (Video-LLMs) have demonstrated their great potential in general-purpose video understanding.To verify the significance of these models, a number of benchmarks have been proposed to diagnose their capabilities in different scenarios.However, existing benchmarks merely evaluate models through video-level question-answering, lacking fine-grained event-level assessment and task diversity.To fill this gap, we introduce E.T. Bench (Event-Level & Time-Sensitive Video Understanding Benchmark), a large-scale and high-quality benchmark for open-ended event-level video understanding.Categorized within a 3-level task taxonomy, E.T. Bench encompasses 7.3K samples under 12 tasks with 7K videos (251.4h total length) under 8 domains, providing comprehensive evaluations.We extensively evaluated 8 Image-LLMs and 12 Video-LLMs on our benchmark, and the results reveal that state-of-the-art models for coarse-level (video-level) understanding struggle to solve our fine-grained tasks, e.g., grounding event-of-interests within videos, largely due to the short video context length, improper time representations, and lack of multi-event training data.Focusing on these issues, we further propose a strong baseline model, E.T. Chat, together with an instruction-tuning dataset E.T. Instruct 164K tailored for fine-grained event-level understanding. 0.78Our simple but effective solution demonstrates superior performance in multiple scenarios.

link

2024-09-25

SynChart: Synthesizing Charts from Language Models

With the release of GPT-4V(O), its use in generating pseudo labels for multi-modality tasks has gained significant popularity.However, it is still a secret how to build such advanced models from its base large language models (LLMs).This work explores the potential of using LLMs alone for data generation and develop competitive multi-modality models focusing on chart understanding.We construct a large-scale chart dataset, SynChart, which contains approximately 4 million diverse chart images with over 75 million dense annotations, including data tables, code, descriptions, and question-answer sets. 0.949We trained a 4.2B chart-expert model using this dataset and achieve near-GPT-4O performance on the ChartQA task, surpassing GPT-4V.

link

2024-09-25

APILOT: Navigating Large Language Models to Generate Secure Code by Sidestepping Outdated API Pitfalls

With the rapid development of large language models (LLMs), their applications have expanded into diverse fields, such as code assistance.However, the substantial size of LLMs makes their training highly resource- and time-intensive, rendering frequent retraining or updates impractical.Consequently, time-sensitive data can become outdated, potentially misleading LLMs in time-aware tasks.For example, new vulnerabilities are discovered in various programs every day.Without updating their knowledge, LLMs may inadvertently generate code that includes these newly discovered vulnerabilities.Current strategies, such as prompt engineering and fine-tuning, do not effectively address this issue. To address this issue, we propose solution, named APILOT, which maintains a realtime, quickly updatable dataset of outdated APIs. 0.821Additionally, APILOT utilizes an augmented generation method that leverages this dataset to navigate LLMs in generating secure, version-aware code.We conducted a comprehensive evaluation to measure the effectiveness of APILOT in reducing the incidence of outdated API recommendations across seven different state-of-the-art LLMs.The evaluation results indicate that APILOT can reduce outdated code recommendations by 89.42% on average with limited performance overhead.Interestingly, while enhancing security, APILOT also improves the usability of the code generated by LLMs, showing an average increase of 27.54% in usability.This underscores APILOT's dual capability to enhance both the safety and practical utility of code suggestions in contemporary software development environments.

link

2024-09-25

SelectiveKD: A semi-supervised framework for cancer detection in DBT through Knowledge Distillation and Pseudo-labeling

When developing Computer Aided Detection (CAD) systems for Digital Breast Tomosynthesis (DBT), the complexity arising from the volumetric nature of the modality poses significant technical challenges for obtaining large-scale accurate annotations.Without access to large-scale annotations, the resulting model may not generalize to different domains.Given the costly nature of obtaining DBT annotations, how to effectively increase the amount of data used for training DBT CAD systems remains an open challenge. In this paper, we present SelectiveKD, a semi-supervised learning framework for building cancer detection models for DBT, which only requires a limited number of annotated slices to reach high performance.We achieve this by utilizing unlabeled slices available in a DBT stack through a knowledge distillation framework in which the teacher model provides a supervisory signal to the student model for all slices in the DBT volume.Our framework mitigates the potential noise in the supervisory signal from a sub-optimal teacher by implementing a selective dataset expansion strategy using pseudo labels. We evaluate our approach with a large-scale real-world dataset of over 10,000 DBT exams collected from multiple device manufacturers and locations. 0.815The resulting SelectiveKD process effectively utilizes unannotated slices from a DBT stack, leading to significantly improved cancer classification performance (AUC) and generalization performance.

link

2024-09-25

Claim-Guided Textual Backdoor Attack for Practical Applications

Recent advances in natural language processing and the increased use of large language models have exposed new security vulnerabilities, such as backdoor attacks.Previous backdoor attacks require input manipulation after model distribution to activate the backdoor, posing limitations in real-world applicability.Addressing this gap, we introduce a novel Claim-Guided Backdoor Attack (CGBA), which eliminates the need for such manipulations by utilizing inherent textual claims as triggers.CGBA leverages claim extraction, clustering, and targeted training to trick models to misbehave on targeted claims without affecting their performance on clean data.CGBA demonstrates its effectiveness and stealthiness across various datasets and models, significantly enhancing the feasibility of practical backdoor attacks.Our code and data will be available at https://github.com/PaperCGBA/CGBA. 0.843

link

2024-09-25

Domain-Independent Automatic Generation of Descriptive Texts for Time-Series Data

Due to scarcity of time-series data annotated with descriptive texts, training a model to generate descriptive texts for time-series data is challenging.In this study, we propose a method to systematically generate domain-independent descriptive texts from time-series data.We identify two distinct approaches for creating pairs of time-series data and descriptive texts: the forward approach and the backward approach.By implementing the novel backward approach, we create the Temporal Automated Captions for Observations (TACO) dataset. 0.831Experimental results demonstrate that a contrastive learning based model trained using the TACO dataset is capable of generating descriptive texts for time-series data in novel domains.

link

2024-09-25

Non-stationary BERT: Exploring Augmented IMU Data For Robust Human Activity Recognition

Human Activity Recognition (HAR) has gained great attention from researchers due to the popularity of mobile devices and the need to observe users' daily activity data for better human-computer interaction.In this work, we collect a human activity recognition dataset called OPPOHAR consisting of phone IMU data. 0.825To facilitate the employment of HAR system in mobile phone and to achieve user-specific activity recognition, we propose a novel light-weight network called Non-stationary BERT with a two-stage training method.We also propose a simple yet effective data augmentation method to explore the deeper relationship between the accelerator and gyroscope data from the IMU.The network achieves the state-of-the-art performance testing on various activity recognition datasets and the data augmentation method demonstrates its wide applicability.

link

2024-09-25

CodeInsight: A Curated Dataset of Practical Coding Solutions from Stack Overflow

We introduce a novel dataset tailored for code generation, aimed at aiding developers in common tasks. 0.873Our dataset provides examples that include a clarified intent, code snippets associated, and an average of three related unit tests.It encompasses a range of libraries such as \texttt{Pandas}, \texttt{Numpy}, and \texttt{Regex}, along with more than 70 standard libraries in Python code derived from Stack Overflow.Comprising 3,409 crafted examples by Python experts, our dataset is designed for both model finetuning and standalone evaluation. 0.749To complete unit tests evaluation, we categorize examples in order to get more fine grained analysis, enhancing the understanding of models' strengths and weaknesses in specific coding tasks.The examples have been refined to reduce data contamination, a process confirmed by the performance of three leading models: Mistral 7B, CodeLLaMa 13B, andStarcoder 15B. We further investigate data-contamination testing GPT-4 performance on a part of our dataset.The benchmark can be accessed at \url{https://github.com/NathanaelBeau/CodeInsight}.

link

2024-09-25

EventHDR: from Event to High-Speed HDR Videos and Beyond

Event cameras are innovative neuromorphic sensors that asynchronously capture the scene dynamics.Due to the event-triggering mechanism, such cameras record event streams with much shorter response latency and higher intensity sensitivity compared to conventional cameras.On the basis of these features, previous works have attempted to reconstruct high dynamic range (HDR) videos from events, but have either suffered from unrealistic artifacts or failed to provide sufficiently high frame rates.In this paper, we present a recurrent convolutional neural network that reconstruct high-speed HDR videos from event sequences, with a key frame guidance to prevent potential error accumulation caused by the sparse event data.Additionally, to address the problem of severely limited real dataset, we develop a new optical system to collect a real-world dataset with paired high-speed HDR videos and event streams, facilitating future research in this field. 0.784Our dataset provides the first real paired dataset for event-to-HDR reconstruction, avoiding potential inaccuracies from simulation strategies.Experimental results demonstrate that our method can generate high-quality, high-speed HDR videos.We further explore the potential of our work in cross-camera reconstruction and downstream computer vision tasks, including object detection, panoramic segmentation, optical flow estimation, and monocular depth estimation under HDR scenarios.

link

2024-09-25

GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design

We provide a dataset for enabling Deep Generative Models (DGMs) in engineering design and propose methods to automate data labeling by utilizing large-scale foundation models. 0.787GeoBiked is curated to contain 4 355 bicycle images, annotated with structural and technical features and is used to investigate two automated labeling techniques: The utilization of consolidated latent features (Hyperfeatures) from image-generation models to detect geometric correspondences (e.g. the position of the wheel center) in structural images and the generation of diverse text descriptions for structural images.GPT-4o, a vision-language-model (VLM), is instructed to analyze images and produce diverse descriptions aligned with the system-prompt.By representing technical images as Diffusion-Hyperfeatures, drawing geometric correspondences between them is possible.The detection accuracy of geometric points in unseen samples is improved by presenting multiple annotated source images.GPT-4o has sufficient capabilities to generate accurate descriptions of technical images.Grounding the generation only on images leads to diverse descriptions but causes hallucinations, while grounding it on categorical labels restricts the diversity.Using both as input balances creativity and accuracy.Successfully using Hyperfeatures for geometric correspondence suggests that this approach can be used for general point-detection and annotation tasks in technical images.Labeling such images with text descriptions using VLMs is possible, but dependent on the models detection capabilities, careful prompt-engineering and the selection of input information.Applying foundation models in engineering design is largely unexplored.We aim to bridge this gap with a dataset to explore training, finetuning and conditioning DGMs in this field and suggesting approaches to bootstrap foundation models to process technical images.

link

2024-09-25

ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis

Volunteer Geographic Information (VGI), with its rich variety, large volume, rapid updates, and diverse sources, has become a critical source of geospatial data.However, VGI data from platforms like OSM exhibit significant quality heterogeneity across different data types, particularly with urban building data.To address this, we propose a multi-source geographic data transformation solution, utilizing accessible and complete VGI data to assist in generating urban building footprint data.We also employ a multimodal data generation framework to improve accuracy.First, we introduce a pipeline for constructing an 'image-text-metadata-building footprint' dataset, primarily based on road network data and supplemented by other multimodal data. 0.935We then present ControlCity, a geographic data transformation method based on a multimodal diffusion model.This method first uses a pre-trained text-to-image model to align text, metadata, and building footprint data.An improved ControlNet further integrates road network and land-use imagery, producing refined building footprint data.Experiments across 22 global cities demonstrate that ControlCity successfully simulates real urban building patterns, achieving state-of-the-art performance.Specifically, our method achieves an average FID score of 50.94, reducing error by 71.01% compared to leading methods, and a MIoU score of 0.36, an improvement of 38.46%.Additionally, our model excels in tasks like urban morphology transfer, zero-shot city generation, and spatial data completeness assessment.In the zero-shot city task, our method accurately predicts and generates similar urban structures, demonstrating strong generalization.This study confirms the effectiveness of our approach in generating urban building footprint data and capturing complex city characteristics.

link

2024-09-25

BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices

Deep neural networks (DNNs) are powerful for cognitive tasks such as image classification, object detection, and scene segmentation.One drawback however is the significant high computational complexity and memory consumption, which makes them unfeasible to run real-time on embedded platforms because of the limited hardware resources.Block floating point (BFP) quantization is one of the representative compression approaches for reducing the memory and computational burden owing to their capability to effectively capture the broad data distribution of DNN models.Unfortunately, prior works on BFP-based quantization empirically choose the block size and the precision that preserve accuracy.In this paper, we develop a BFP-based bitwidth-aware analytical modeling framework (called ``BitQ'') for the best BFP implementation of DNN inference on embedded platforms.We formulate and resolve an optimization problem to identify the optimal BFP block size and bitwidth distribution by the trade-off of both accuracy and performance loss.Experimental results show that compared with an equal bitwidth setting, the BFP DNNs with optimized bitwidth allocation provide efficient computation, preserving accuracy on famous benchmarks.The source code and data are available at https://github.com/Cheliosoops/BitQ. 0.846

link

2024-09-25

Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts

Prototyping complex computer-aided design (CAD) models in modern softwares can be very time-consuming.This is due to the lack of intelligent systems that can quickly generate simpler intermediate parts.We propose Text2CAD, the first AI framework for generating text-to-parametric CAD models using designer-friendly instructions for all skill levels.Furthermore, we introduce a data annotation pipeline for generating text prompts based on natural language instructions for the DeepCAD dataset using Mistral and LLaVA-NeXT. 0.782The dataset contains $\sim170$K models and $\sim660$K text annotations, from abstract CAD descriptions (e.g., generate two concentric cylinders) to detailed specifications (e.g., draw two circles with center $(x,y)$ and radius $r_{1}$, $r_{2}$, and extrude along the normal by $d$...). 0.844Within the Text2CAD framework, we propose an end-to-end transformer-based auto-regressive network to generate parametric CAD models from input texts.We evaluate the performance of our model through a mixture of metrics, including visual quality, parametric precision, and geometrical accuracy.Our proposed framework shows great potential in AI-aided design applications.Our source code and annotations will be publicly available.

link

2024-09-25

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Large language model pre-training has traditionally relied on human experts to craft heuristics for improving the corpora quality, resulting in numerous rules developed to date.However, these rules lack the flexibility to address the unique characteristics of individual example effectively.Meanwhile, applying tailored rules to every example is impractical for human experts.In this paper, we demonstrate that even small language models, with as few as 0.3B parameters, can exhibit substantial data refining capabilities comparable to those of human experts.We introduce Programming Every Example (ProX), a novel framework that treats data refinement as a programming task, enabling models to refine corpora by generating and executing fine-grained operations, such as string normalization, for each individual example at scale.Experimental results show that models pre-trained on ProX-curated data outperform either original data or data filtered by other selection methods by more than 2% across various downstream benchmarks.Its effectiveness spans various model sizes and pre-training corpora, including C4, RedPajama-V2, and FineWeb.Furthermore, ProX exhibits significant potential in domain-specific continual pre-training: without domain specific design, models trained on OpenWebMath refined by ProX outperform human-crafted rule-based methods, improving average accuracy by 7.6% over Mistral-7B, with 14.6% for Llama-2-7B and 20.3% for CodeLlama-7B, all within 10B tokens to be comparable to models like Llemma-7B trained on 200B tokens.Further analysis highlights that ProX significantly saves training FLOPs, offering a promising path for efficient LLM pre-training.We are open-sourcing ProX with >100B corpus, models, and sharing all training and implementation details for reproducible research and future innovation. 0.802Code: https://github.com/GAIR-NLP/ProX

link

2024-09-25

PokeFlex: Towards a Real-World Dataset of Deformable Objects for Robotic Manipulation

Advancing robotic manipulation of deformable objects can enable automation of repetitive tasks across multiple industries, from food processing to textiles and healthcare.Yet robots struggle with the high dimensionality of deformable objects and their complex dynamics.While data-driven methods have shown potential for solving manipulation tasks, their application in the domain of deformable objects has been constrained by the lack of data.To address this, we propose PokeFlex, a pilot dataset featuring real-world 3D mesh data of actively deformed objects, together with the corresponding forces and torques applied by a robotic arm, using a simple poking strategy. 0.768Deformations are captured with a professional volumetric capture system that allows for complete 360-degree reconstruction.The PokeFlex dataset consists of five deformable objects with varying stiffness and shapes. 0.808Additionally, we leverage the PokeFlex dataset to train a vision model for online 3D mesh reconstruction from a single image and a template mesh.We refer readers to the supplementary material and to our website ( https://pokeflex-dataset.github.io/ ) for demos and examples of our dataset. 0.917

link

2024-09-25

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Today's most advanced multimodal models remain proprietary.The strongest open-weight models rely heavily on synthetic data from proprietary VLMs to achieve good performance, effectively distilling these closed models into open ones.As a result, the community is still missing foundational knowledge about how to build performant VLMs from scratch.We present Molmo, a new family of VLMs that are state-of-the-art in their class of openness.Our key innovation is a novel, highly detailed image caption dataset collected entirely from human annotators using speech-based descriptions. 0.761To enable a wide array of user interactions, we also introduce a diverse dataset mixture for fine-tuning that includes in-the-wild Q&A and innovative 2D pointing data.The success of our approach relies on careful choices for the model architecture details, a well-tuned training pipeline, and, most critically, the quality of our newly collected datasets, all of which will be released.The best-in-class 72B model within the Molmo family not only outperforms others in the class of open weight and data models but also compares favorably against proprietary systems like GPT-4o, Claude 3.5, and Gemini 1.5 on both academic benchmarks and human evaluation. We will be releasing all of our model weights, captioning and fine-tuning data, and source code in the near future.Select model weights, inference code, and demo are available at https://molmo.allenai.org.

link

2024-09-24

Strategies for Improving NL-to-FOL Translation with LLMs: Data Generation, Incremental Fine-Tuning, and Verification

Logical reasoning is a fundamental task in natural language processing that presents significant challenges to Large Language Models (LLMs).The inherent characteristics of logical reasoning makes it well-suited for symbolic representations such as first-order logic (FOL).Research in symbolic logical reasoning explored FOL generation using state-of-the-art LLMs (i.e., GPT-4) to produce FOL translations of natural language (NL) statements, but errors in translation are usually not the focus.We address this by categorizing the translation errors in FOL statements generated by LLMs.To make progress towards improving the quality of FOL translations for smaller language models such as LLaMA-2 13B and Mistral 7B, we create ProofFOL, a high-quality FOL-annotated subset of ProofWriter dataset using GPT-4o.The models fine-tuned on this silver standard data achieve a significant gain in performance when compared to larger language models such as LLaMA-2 70B.In addition to improving the model using large data, we also tackle the issue of data scarcity and introduce an incremental framework encompassing of data augmentation and verification steps.In the augmentation process, a single pair of (premises, conclusion) is split into multiple new instances based on the predicates and FOLs.This data is used for fine-tuning, and the inference on this model generates FOLs with fewer errors over the model trained on the original data.Our investigation on the translation errors leads to generation of a perturbation dataset, which is used to train a verifier that corrects potential syntactic and semantic FOL translation errors.We demonstrate an efficient method for making the most of a limited existing human-annotated dataset. 0.932Our results show state-of-the-art performance for ProofWriter and ProntoQA datasets using ProofFOL on LLaMA-2 and Mistral models.

link

2024-09-24

PDT: Uav Target Detection Dataset for Pests and Diseases Tree

UAVs emerge as the optimal carriers for visual weed iden?tification and integrated pest and disease management in crops.How?ever, the absence of specialized datasets impedes the advancement of model development in this domain.To address this, we have developed the Pests and Diseases Tree dataset (PDT dataset). 0.928PDT dataset repre?sents the first high-precision UAV-based dataset for targeted detection of tree pests and diseases, which is collected in real-world operational environments and aims to fill the gap in available datasets for this field. 0.884Moreover, by aggregating public datasets and network data, we further introduced the Common Weed and Crop dataset (CWC dataset) to ad?dress the challenge of inadequate classification capabilities of test models within datasets for this field.Finally, we propose the YOLO-Dense Pest (YOLO-DP) model for high-precision object detection of weed, pest, and disease crop images.We re-evaluate the state-of-the-art detection models with our proposed PDT dataset and CWC dataset, showing the completeness of the dataset and the effectiveness of the YOLO-DP.The proposed PDT dataset, CWC dataset, and YOLO-DP model are pre?sented at https://github.com/RuiXing123/PDT_CWC_YOLO-DP. 0.776

link

2024-09-24

SoMaSLAM: 2D Graph SLAM for Sparse Range Sensing with Soft Manhattan World Constraints

We propose a graph SLAM algorithm for sparse range sensing that incorporates a soft Manhattan world utilizing landmark-landmark constraints.Sparse range sensing is necessary for tiny robots that do not have the luxury of using heavy and expensive sensors.Existing SLAM methods dealing with sparse range sensing lack accuracy and accumulate drift error over time due to limited access to data points.Algorithms that cover this flaw using structural regularities, such as the Manhattan world (MW), have shortcomings when mapping real-world environments that do not coincide with the rules.We propose SoMaSLAM, a 2D graph SLAM designed for tiny robots with sparse range sensing.Our approach effectively maps sparse range data without enforcing strict structural regularities and maintains an adaptive graph.We implement the MW assumption as soft constraints, which we refer to as a soft Manhattan world.We propose novel soft landmark-landmark constraints to incorporate the soft MW into graph SLAM.Through extensive evaluation, we demonstrate that our proposed SoMaSLAM method improves localization accuracy on diverse datasets and is flexible enough to be used in the real world.We release our source code and sparse range datasets at https://SoMaSLAM.github.io/. 0.737

link

2024-09-24

The Roles of Generative Artificial Intelligence in Internet of Electric Vehicles

With the advancement of generative artificial intelligence (GenAI) models, their capability to generate content is seeing significant enhancement, leading to widespread applications in the field of data generation and forecasting.Furthermore, GenAI has strong capabilities in data modeling and analysis, which enhances Internet of electric vehicles (IoEV) applications in various aspects.In this paper, we investigate and survey applications of GenAI in the IoEV.Specifically, we categorize GenAI for IoEV into four different layers namely, EV's battery layer, individual electric vehicle (EV) layer, smart grid with EV layer, and security layer.We first introduce various GenAI techniques used in each layer of IoEV applications.Subsequently, public datasets available for training the GenAI models are summarized. 0.707Finally, we provide recommendations for future directions.This survey not only categorizes the applications of GenAI in IoEV across different layers but also serves as a valuable resource for researchers and practitioners by highlighting the design and implementation challenges within each layer.Furthermore, it provides a roadmap for future research directions, enabling the development of more robust and efficient IoEV systems through the integration of advanced GenAI techniques.

link

2024-09-24

Towards Universal Large-Scale Foundational Model for Natural Gas Demand Forecasting

In the context of global energy strategy, accurate natural gas demand forecasting is crucial for ensuring efficient resource allocation and operational planning.Traditional forecasting methods struggle to cope with the growing complexity and variability of gas consumption patterns across diverse industries and commercial sectors.To address these challenges, we propose the first foundation model specifically tailored for natural gas demand forecasting.Foundation models, known for their ability to generalize across tasks and datasets, offer a robust solution to the limitations of traditional methods, such as the need for separate models for different customer segments and their limited generalization capabilities.Our approach leverages contrastive learning to improve prediction accuracy in real-world scenarios, particularly by tackling issues such as noise in historical consumption data and the potential misclassification of similar data samples, which can lead to degradation in the quaility of the representation and thus the accuracy of downstream forecasting tasks.By integrating advanced noise filtering techniques within the contrastive learning framework, our model enhances the quality of learned representations, leading to more accurate predictions.Furthermore, the model undergoes industry-specific fine-tuning during pretraining, enabling it to better capture the unique characteristics of gas consumption across various sectors.We conducted extensive experiments using a large-scale dataset from ENN Group, which includes data from over 10,000 industrial, commercial, and welfare-related customers across multiple regions. 0.717Our model outperformed existing state-of-the-art methods, demonstrating a relative improvement in MSE by 3.68\% and in MASE by 6.15\% compared to the best available model.

link

2024-09-24

iGAiVA: Integrated Generative AI and Visual Analytics in a Machine Learning Workflow for Text Classification

In developing machine learning (ML) models for text classification, one common challenge is that the collected data is often not ideally distributed, especially when new classes are introduced in response to changes of data and tasks.In this paper, we present a solution for using visual analytics (VA) to guide the generation of synthetic data using large language models. 0.788As VA enables model developers to identify data-related deficiency, data synthesis can be targeted to address such deficiency.We discuss different types of data deficiency, describe different VA techniques for supporting their identification, and demonstrate the effectiveness of targeted data synthesis in improving model accuracy.In addition, we present a software tool, iGAiVA, which maps four groups of ML tasks into four VA views, integrating generative AI and VA into an ML workflow for developing and improving text classification models.

link

2024-09-24

Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted Data

Accurate depth perception is crucial for patient outcomes in endoscopic surgery, yet it is compromised by image distortions common in surgical settings.To tackle this issue, our study presents a benchmark for assessing the robustness of endoscopic depth estimation models.We have compiled a comprehensive dataset that reflects real-world conditions, incorporating a range of synthetically induced corruptions at varying severity levels. 0.792To further this effort, we introduce the Depth Estimation Robustness Score (DERS), a novel metric that combines measures of error, accuracy, and robustness to meet the multifaceted requirements of surgical applications.This metric acts as a foundational element for evaluating performance, establishing a new paradigm for the comparative analysis of depth estimation technologies.Additionally, we set forth a benchmark focused on robustness for the evaluation of depth estimation in endoscopic surgery, with the aim of driving progress in model refinement.A thorough analysis of two monocular depth estimation models using our framework reveals crucial information about their reliability under adverse conditions.Our results emphasize the essential need for algorithms that can tolerate data corruption, thereby advancing discussions on improving model robustness.The impact of this research transcends theoretical frameworks, providing concrete gains in surgical precision and patient safety.This study establishes a benchmark for the robustness of depth estimation and serves as a foundation for developing more resilient surgical support technologies.Code is available at https://github.com/lofrienger/EndoDepthBenchmark.

link

2024-09-24

Neuromorphic Drone Detection: an Event-RGB Multimodal Approach

In recent years, drone detection has quickly become a subject of extreme interest: the potential for fast-moving objects of contained dimensions to be used for malicious intents or even terrorist attacks has posed attention to the necessity for precise and resilient systems for detecting and identifying such elements.While extensive literature and works exist on object detection based on RGB data, it is also critical to recognize the limits of such modality when applied to UAVs detection.Detecting drones indeed poses several challenges such as fast-moving objects and scenes with a high dynamic range or, even worse, scarce illumination levels.Neuromorphic cameras, on the other hand, can retain precise and rich spatio-temporal information in situations that are challenging for RGB cameras.They are resilient to both high-speed moving objects and scarce illumination settings, while prone to suffer a rapid loss of information when the objects in the scene are static.In this context, we present a novel model for integrating both domains together, leveraging multimodal data to take advantage of the best of both worlds.To this end, we also release NeRDD (Neuromorphic-RGB Drone Detection), a novel spatio-temporally synchronized Event-RGB Drone detection dataset of more than 3.5 hours of multimodal annotated recordings. 0.789

link

2024-09-24

Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation

Radiology is a vital and complex component of modern clinical workflow and covers many tasks.Recently, vision-language (VL) foundation models in medicine have shown potential in processing multimodal information, offering a unified solution for various radiology tasks.However, existing studies either pre-trained VL models on natural data or did not fully integrate vision-language architecture and pretraining, often neglecting the unique multimodal complexity in radiology images and their textual contexts.Additionally, their practical applicability in real-world scenarios remains underexplored.Here, we present RadFound, a large and open-source vision-language foundation model tailored for radiology, that is trained on the most extensive dataset of over 8.1 million images and 250,000 image-text pairs, covering 19 major organ systems and 10 imaging modalities. 0.848To establish expert-level multimodal perception and generation capabilities, RadFound introduces an enhanced vision encoder to capture intra-image local features and inter-image contextual information, and a unified cross-modal learning design tailored to radiology.To fully assess the models' capability, we construct a benchmark, RadVLBench, including radiology interpretation tasks like medical vision-language question-answering, as well as text generation tasks ranging from captioning to report generation.We also propose a human evaluation framework.When evaluated on the real-world benchmark involving three representative modalities, 2D images (chest X-rays), multi-view images (mammograms), and 3D images (thyroid CT scans), RadFound significantly outperforms other VL foundation models on both quantitative metrics and human evaluation.In summary, the development of RadFound represents an advancement in radiology generalists, demonstrating broad applicability potential for integration into clinical workflows.

link

2024-09-24

Semantic Refocused Tuning for Open-Vocabulary Panoptic Segmentation

Open-vocabulary panoptic segmentation is an emerging task aiming to accurately segment the image into semantically meaningful masks based on a set of texts.Despite existing efforts, it remains challenging to develop a high-performing method that generalizes effectively across new domains and requires minimal training resources.Our in-depth analysis of current methods reveals a crucial insight: mask classification is the main performance bottleneck for open-vocab.panoptic segmentation.Based on this, we propose Semantic Refocused Tuning (SMART), a novel framework that greatly enhances open-vocab.panoptic segmentation by improving mask classification through two key innovations.First, SMART adopts a multimodal Semantic-guided Mask Attention mechanism that injects task-awareness into the regional information extraction process.This enables the model to capture task-specific and contextually relevant information for more effective mask classification.Second, it incorporates Query Projection Tuning, which strategically fine-tunes the query projection layers within the Vision Language Model (VLM) used for mask classification.This adjustment allows the model to adapt the image focus of mask tokens to new distributions with minimal training resources, while preserving the VLM's pre-trained knowledge.Extensive ablation studies confirm the superiority of our approach.Notably, SMART sets new state-of-the-art results, demonstrating improvements of up to +1.3 PQ and +5.4 mIoU across representative benchmarks, while reducing training costs by nearly 10x compared to the previous best method.Our code and data will be released. 0.906

link

2024-09-18

Towards Global Localization using Multi-Modal Object-Instance Re-Identification

Re-identification (ReID) is a critical challenge in computer vision, predominantly studied in the context of pedestrians and vehicles.However, robust object-instance ReID, which has significant implications for tasks such as autonomous exploration, long-term perception, and scene understanding, remains underexplored.In this work, we address this gap by proposing a novel dual-path object-instance re-identification transformer architecture that integrates multimodal RGB and depth information.By leveraging depth data, we demonstrate improvements in ReID across scenes that are cluttered or have varying illumination conditions.Additionally, we develop a ReID-based localization framework that enables accurate camera localization and pose identification across different viewpoints.We validate our methods using two custom-built RGB-D datasets, as well as multiple sequences from the open-source TUM RGB-D datasets. 0.828Our approach demonstrates significant improvements in both object instance ReID (mAP of 75.18) and localization accuracy (success rate of 83% on TUM-RGBD), highlighting the essential role of object ReID in advancing robotic perception.Our models, frameworks, and datasets have been made publicly available. 0.793

link

2024-09-18

BRDF-NeRF: Neural Radiance Fields with Optical Satellite Images and BRDF Modelling

Understanding the anisotropic reflectance of complex Earth surfaces from satellite imagery is crucial for numerous applications.Neural radiance fields (NeRF) have become popular as a machine learning technique capable of deducing the bidirectional reflectance distribution function (BRDF) of a scene from multiple images.However, prior research has largely concentrated on applying NeRF to close-range imagery, estimating basic Microfacet BRDF models, which fall short for many Earth surfaces.Moreover, high-quality NeRFs generally require several images captured simultaneously, a rare occurrence in satellite imaging.To address these limitations, we propose BRDF-NeRF, developed to explicitly estimate the Rahman-Pinty-Verstraete (RPV) model, a semi-empirical BRDF model commonly employed in remote sensing.We assess our approach using two datasets: (1) Djibouti, captured in a single epoch at varying viewing angles with a fixed Sun position, and (2) Lanzhou, captured over multiple epochs with different viewing angles and Sun positions. 0.76Our results, based on only three to four satellite images for training, demonstrate that BRDF-NeRF can effectively synthesize novel views from directions far removed from the training data and produce high-quality digital surface models (DSMs).

link

2024-09-18

Generalized Robot Learning Framework

Imitation based robot learning has recently gained significant attention in the robotics field due to its theoretical potential for transferability and generalizability.However, it remains notoriously costly, both in terms of hardware and data collection, and deploying it in real-world environments demands meticulous setup of robots and precise experimental conditions.In this paper, we present a low-cost robot learning framework that is both easily reproducible and transferable to various robots and environments.We demonstrate that deployable imitation learning can be successfully applied even to industrial-grade robots, not just expensive collaborative robotic arms.Furthermore, our results show that multi-task robot learning is achievable with simple network architectures and fewer demonstrations than previously thought necessary.As the current evaluating method is almost subjective when it comes to real-world manipulation tasks, we propose Voting Positive Rate (VPR) - a novel evaluation strategy that provides a more objective assessment of performance.We conduct an extensive comparison of success rates across various self-designed tasks to validate our approach.To foster collaboration and support the robot learning community, we have open-sourced all relevant datasets and model checkpoints, available at huggingface.co/ZhiChengAI. 0.817

link

2024-09-18

Generalized compression and compressive search of large datasets

The Big Data explosion has necessitated the development of search algorithms that scale sub-linearly in time and memory. While compression algorithms and search algorithms do exist independently, few algorithms offer both, and those which do are domain-specific. We present panCAKES, a novel approach to compressive search, i.e., a way to perform $k$-NN and $\rho$-NN search on compressed data while only decompressing a small, relevant, portion of the data. panCAKES assumes the manifold hypothesis and leverages the low-dimensional structure of the data to compress and search it efficiently. panCAKES is generic over any distance function for which the distance between two points is proportional to the memory cost of storing an encoding of one in terms of the other. This property holds for many widely-used distance functions, e.g. string edit distances (Levenshtein, Needleman-Wunsch, etc.) and set dissimilarity measures (Jaccard, Dice, etc.). We benchmark panCAKES on a variety of datasets, including genomic, proteomic, and set data. 0.796We compare compression ratios to gzip, and search performance between the compressed and uncompressed versions of the same dataset. panCAKES achieves compression ratios close to those of gzip, while offering sub-linear time performance for $k$-NN and $\rho$-NN search. We conclude that panCAKES is an efficient, general-purpose algorithm for exact compressive search on large datasets that obey the manifold hypothesis. We provide an open-source implementation of panCAKES in the Rust programming language.

link

Data Quality

2024-09-26

Improving Fast Adversarial Training via Self-Knowledge Guidance

Adversarial training has achieved remarkable advancements in defending against adversarial attacks.Among them, fast adversarial training (FAT) is gaining attention for its ability to achieve competitive robustness with fewer computing resources.Existing FAT methods typically employ a uniform strategy that optimizes all training data equally without considering the influence of different examples, which leads to an imbalanced optimization.However, this imbalance remains unexplored in the field of FAT.In this paper, we conduct a comprehensive study of the imbalance issue in FAT and observe an obvious class disparity regarding their performances.This disparity could be embodied from a perspective of alignment between clean and robust accuracy.Based on the analysis, we mainly attribute the observed misalignment and disparity to the imbalanced optimization in FAT, which motivates us to optimize different training data adaptively to enhance robustness.Specifically, we take disparity and misalignment into consideration.First, we introduce self-knowledge guided regularization, which assigns differentiated regularization weights to each class based on its training state, alleviating class disparity.Additionally, we propose self-knowledge guided label relaxation, which adjusts label relaxation according to the training accuracy, alleviating the misalignment and improving robustness. 0.703By combining these methods, we formulate the Self-Knowledge Guided FAT (SKG-FAT), leveraging naturally generated knowledge during training to enhance the adversarial robustness without compromising training efficiency.Extensive experiments on four standard datasets demonstrate that the SKG-FAT improves the robustness and preserves competitive clean accuracy, outperforming the state-of-the-art methods.

link

2024-09-26

Dirichlet-Based Coarse-to-Fine Example Selection For Open-Set Annotation

Active learning (AL) has achieved great success by selecting the most valuable examples from unlabeled data.However, they usually deteriorate in real scenarios where open-set noise gets involved, which is studied as open-set annotation (OSA).In this paper, we owe the deterioration to the unreliable predictions arising from softmax-based translation invariance and propose a Dirichlet-based Coarse-to-Fine Example Selection (DCFS) strategy accordingly.Our method introduces simplex-based evidential deep learning (EDL) to break translation invariance and distinguish known and unknown classes by considering evidence-based data and distribution uncertainty simultaneously.Furthermore, hard known-class examples are identified by model discrepancy generated from two classifier heads, where we amplify and alleviate the model discrepancy respectively for unknown and known classes. 0.602Finally, we combine the discrepancy with uncertainties to form a two-stage strategy, selecting the most informative examples from known classes.Extensive experiments on various openness ratio datasets demonstrate that DCFS achieves state-of-art performance.

link

2024-09-26

Bias Assessment and Data Drift Detection in Medical Image Analysis: A Survey

Machine Learning (ML) models have gained popularity in medical imaging analysis given their expert level performance in many medical domains.To enhance the trustworthiness, acceptance, and regulatory compliance of medical imaging models and to facilitate their integration into clinical settings, we review and categorise methods for ensuring ML reliability, both during development and throughout the model's lifespan.Specifically, we provide an overview of methods assessing models' inner-workings regarding bias encoding and detection of data drift for disease classification models.Additionally, to evaluate the severity in case of a significant drift, we provide an overview of the methods developed for classifier accuracy estimation in case of no access to ground truth labels. 0.679This should enable practitioners to implement methods ensuring reliable ML deployment and consistent prediction performance over time.

link

2024-09-25

Supporting Co-Adaptive Machine Teaching through Human Concept Learning and Cognitive Theories

An important challenge in interactive machine learning, particularly in subjective or ambiguous domains, is fostering bi-directional alignment between humans and models.Users teach models their concept definition through data labeling, while refining their own understandings throughout the process.To facilitate this, we introduce MOCHA, an interactive machine learning tool informed by two theories of human concept learning and cognition.First, it utilizes a neuro-symbolic pipeline to support Variation Theory-based counterfactual data generation.By asking users to annotate counterexamples that are syntactically and semantically similar to already-annotated data but predicted to have different labels, the system can learn more effectively while helping users understand the model and reflect on their own label definitions. 0.654Second, MOCHA uses Structural Alignment Theory to present groups of counterexamples, helping users comprehend alignable differences between data items and annotate them in batch.We validated MOCHA's effectiveness and usability through a lab study with 18 participants.

link

2024-09-25

XAI-guided Insulator Anomaly Detection for Imbalanced Datasets

Power grids serve as a vital component in numerous industries, seamlessly delivering electrical energy to industrial processes and technologies, making their safe and reliable operation indispensable.However, powerlines can be hard to inspect due to difficult terrain or harsh climatic conditions.Therefore, unmanned aerial vehicles are increasingly deployed to inspect powerlines, resulting in a substantial stream of visual data which requires swift and accurate processing.Deep learning methods have become widely popular for this task, proving to be a valuable asset in fault detection.In particular, the detection of insulator defects is crucial for predicting powerline failures, since their malfunction can lead to transmission disruptions.It is therefore of great interest to continuously maintain and rigorously inspect insulator components.In this work we propose a novel pipeline to tackle this task.We utilize state-of-the-art object detection to detect and subsequently classify individual insulator anomalies.Our approach addresses dataset challenges such as imbalance and motion-blurred images through a fine-tuning methodology which allows us to alter the classification focus of the model by increasing the classification accuracy of anomalous insulators.In addition, we employ explainable-AI tools for precise localization and explanation of anomalies.This proposed method contributes to the field of anomaly detection, particularly vision-based industrial inspection and predictive maintenance.We significantly improve defect detection accuracy by up to 13%, while also offering a detailed analysis of model mis-classifications and localization quality, showcasing the potential of our method on real-world data. 0.654

link

2024-09-24

Learning with Confidence: Training Better Classifiers from Soft Labels

In supervised machine learning, models are typically trained using data with hard labels, i.e., definite assignments of class membership.This traditional approach, however, does not take the inherent uncertainty in these labels into account.We investigate whether incorporating label uncertainty, represented as discrete probability distributions over the class labels -- known as soft labels -- improves the predictive performance of classification models. 0.683We first demonstrate the potential value of soft label learning (SLL) for estimating model parameters in a simulation experiment, particularly for limited sample sizes and imbalanced data.Subsequently, we compare the performance of various wrapper methods for learning from both hard and soft labels using identical base classifiers. 0.605On real-world-inspired synthetic data with clean labels, the SLL methods consistently outperform hard label methods.Since real-world data is often noisy and precise soft labels are challenging to obtain, we study the effect that noisy probability estimates have on model performance.Alongside conventional noise models, our study examines four types of miscalibration that are known to affect human annotators. 0.781The results show that SLL methods outperform the hard label methods in the majority of settings.Finally, we evaluate the methods on a real-world dataset with confidence scores, where the SLL methods are shown to match the traditional methods for predicting the (noisy) hard labels while providing more accurate confidence estimates.

link

2024-09-16

Efficiently Crowdsourcing Visual Importance with Punch-Hole Annotation

We introduce a novel crowdsourcing method for identifying important areas in graphical images through punch-hole labeling.Traditional methods, such as gaze trackers and mouse-based annotations, which generate continuous data, can be impractical in crowdsourcing scenarios.They require many participants, and the outcome data can be noisy.In contrast, our method first segments the graphical image with a grid and drops a portion of the patches (punch holes).Then, we iteratively ask the labeler to validate each annotation with holes, narrowing down the annotation only having the most important area. 0.767This approach aims to reduce annotation noise in crowdsourcing by standardizing the annotations while enhancing labeling efficiency and reliability. 0.779Preliminary findings from fundamental charts demonstrate that punch-hole labeling can effectively pinpoint critical regions.This also highlights its potential for broader application in visualization research, particularly in studying large-scale users' graphical perception.Our future work aims to enhance the algorithm to achieve faster labeling speed and prove its utility through large-scale experiments.

link

2024-09-16

Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs

The growing demand for AI training data has transformed data annotation into a global industry, but traditional approaches relying on human annotators are often time-consuming, labor-intensive, and prone to inconsistent quality.We propose the Model-in-the-Loop (MILO) framework, which integrates AI/ML models into the annotation process.Our research introduces a collaborative paradigm that leverages the strengths of both professional human annotators and large language models (LLMs).By employing LLMs as pre-annotation and real-time assistants, and judges on annotator responses, MILO enables effective interaction patterns between human annotators and LLMs.Three empirical studies on multimodal data annotation demonstrate MILO's efficacy in reducing handling time, improving data quality, and enhancing annotator experiences.We also introduce quality rubrics for flexible evaluation and fine-grained feedback on open-ended annotations. 0.63The MILO framework has implications for accelerating AI/ML development, reducing reliance on human annotation alone, and promoting better alignment between human and machine values.

link

2024-09-12

Task-Augmented Cross-View Imputation Network for Partial Multi-View Incomplete Multi-Label Classification

In real-world scenarios, multi-view multi-label learning often encounters the challenge of incomplete training data due to limitations in data collection and unreliable annotation processes. 0.68The absence of multi-view features impairs the comprehensive understanding of samples, omitting crucial details essential for classification.To address this issue, we present a task-augmented cross-view imputation network (TACVI-Net) for the purpose of handling partial multi-view incomplete multi-label classification.Specifically, we employ a two-stage network to derive highly task-relevant features to recover the missing views.In the first stage, we leverage the information bottleneck theory to obtain a discriminative representation of each view by extracting task-relevant information through a view-specific encoder-classifier architecture.In the second stage, an autoencoder based multi-view reconstruction network is utilized to extract high-level semantic representation of the augmented features and recover the missing data, thereby aiding the final classification task.Extensive experiments on five datasets demonstrate that our TACVI-Net outperforms other state-of-the-art methods.

link

2024-09-11

Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm

This research presents Muskits-ESPnet, a versatile toolkit that introduces new paradigms to Singing Voice Synthesis (SVS) through the application of pretrained audio models in both continuous and discrete approaches.Specifically, we explore discrete representations derived from SSL models and audio codecs and offer significant advantages in versatility and intelligence, supporting multi-format inputs and adaptable data processing workflows for various SVS models.The toolkit features automatic music score error detection and correction, as well as a perception auto-evaluation module to imitate human subjective evaluating scores. 0.658Muskits-ESPnet is available at \url{https://github.com/espnet/espnet}.

link

2024-09-10

Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking

Dialogue State Tracking (DST) is a key part of task-oriented dialogue systems, identifying important information in conversations.However, its accuracy drops significantly in spoken dialogue environments due to named entity errors from Automatic Speech Recognition (ASR) systems.We introduce a simple yet effective data augmentation method that targets those entities to improve the robustness of DST model.Our novel method can control the placement of errors using keyword-highlighted prompts while introducing phonetically similar errors. 0.63As a result, our method generated sufficient error patterns on keywords, leading to improved accuracy in noised and low-accuracy ASR environments.

link

Benchmarks

2024-09-26

SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning

Open-set semi-supervised learning (OSSL) leverages practical open-set unlabeled data, comprising both in-distribution (ID) samples from seen classes and out-of-distribution (OOD) samples from unseen classes, for semi-supervised learning (SSL).Prior OSSL methods initially learned the decision boundary between ID and OOD with labeled ID data, subsequently employing self-training to refine this boundary.These methods, however, suffer from the tendency to overtrust the labeled ID data: the scarcity of labeled data caused the distribution bias between the labeled samples and the entire ID data, which misleads the decision boundary to overfit.The subsequent self-training process, based on the overfitted result, fails to rectify this problem.In this paper, we address the overtrusting issue by treating OOD samples as an additional class, forming a new SSL process. Specifically, we propose SCOMatch, a novel OSSL method that 1) selects reliable OOD samples as new labeled data with an OOD memory queue and a corresponding update strategy and 2) integrates the new SSL process into the original task through our Simultaneous Close-set and Open-set self-training.SCOMatch refines the decision boundary of ID and OOD classes across the entire dataset, thereby leading to improved results.Extensive experimental results show that SCOMatch significantly outperforms the state-of-the-art methods on various benchmarks. 0.739The effectiveness is further verified through ablation studies and visualization.

link

2024-09-26

Functional Classification of Spiking Signal Data Using Artificial Intelligence Techniques: A Review

Human brain neuron activities are incredibly significant nowadays.Neuronal behavior is assessed by analyzing signal data such as electroencephalography (EEG), which can offer scientists valuable information about diseases and human-computer interaction.One of the difficulties researchers confront while evaluating these signals is the existence of large volumes of spike data.Spikes are some considerable parts of signal data that can happen as a consequence of vital biomarkers or physical issues such as electrode movements.Hence, distinguishing types of spikes is important.From this spot, the spike classification concept commences.Previously, researchers classified spikes manually.The manual classification was not precise enough as it involves extensive analysis.Consequently, Artificial Intelligence (AI) was introduced into neuroscience to assist clinicians in classifying spikes correctly.This review discusses the importance and use of AI in spike classification, focusing on the recognition of neural activity noises.The task is divided into three main components: preprocessing, classification, and evaluation.Existing methods are introduced and their importance is determined.The review also highlights the need for more efficient algorithms. 0.719The primary goal is to provide a perspective on spike classification for future research and provide a comprehensive understanding of the methodologies and issues involved.The review organizes materials in the spike classification field for future studies.In this work, numerous studies were extracted from different databases.The PRISMA-related research guidelines were then used to choose papers.Then, research studies based on spike classification using machine learning and deep learning approaches with effective preprocessing were selected.

link

2024-09-26

Dataset Distillation-based Hybrid Federated Learning on Non-IID Data

In federated learning, the heterogeneity of client data has a great impact on the performance of model training.Many heterogeneity issues in this process are raised by non-independently and identically distributed (Non-IID) data.This study focuses on the issue of label distribution skew.To address it, we propose a hybrid federated learning framework called HFLDD, which integrates dataset distillation to generate approximately independent and equally distributed (IID) data, thereby improving the performance of model training.Particularly, we partition the clients into heterogeneous clusters, where the data labels among different clients within a cluster are unbalanced while the data labels among different clusters are balanced.The cluster headers collect distilled data from the corresponding cluster members, and conduct model training in collaboration with the server.This training process is like traditional federated learning on IID data, and hence effectively alleviates the impact of Non-IID data on model training.Furthermore, we compare our proposed method with typical baseline methods on public datasets. 0.754Experimental results demonstrate that when the data labels are severely imbalanced, the proposed HFLDD outperforms the baseline methods in terms of both test accuracy and communication cost. 0.631

link

2024-09-26

SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion

Visual grounding is a common vision task that involves grounding descriptive sentences to the corresponding regions of an image.Most existing methods use independent image-text encoding and apply complex hand-crafted modules or encoder-decoder architectures for modal interaction and query reasoning.However, their performance significantly drops when dealing with complex textual expressions.This is because the former paradigm only utilizes limited downstream data to fit the multi-modal feature fusion.Therefore, it is only effective when the textual expressions are relatively simple.In contrast, given the wide diversity of textual expressions and the uniqueness of downstream training data, the existing fusion module, which extracts multimodal content from a visual-linguistic context, has not been fully investigated.In this paper, we present a simple yet robust transformer-based framework, SimVG, for visual grounding.Specifically, we decouple visual-linguistic feature fusion from downstream tasks by leveraging existing multimodal pre-trained models and incorporating additional object tokens to facilitate deep integration of downstream and pre-training tasks.Furthermore, we design a dynamic weight-balance distillation method in the multi-branch synchronous learning process to enhance the representation capability of the simpler branch.This branch only consists of a lightweight MLP, which simplifies the structure and improves reasoning speed.Experiments on six widely used VG datasets, i.e., RefCOCO/+/g, ReferIt, Flickr30K, and GRefCOCO, demonstrate the superiority of SimVG.Finally, the proposed method not only achieves improvements in efficiency and convergence speed but also attains new state-of-the-art performance on these benchmarks. 0.817Codes and models will be available at \url{https://github.com/Dmmm1997/SimVG}.

link

2024-09-26

Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis

Protecting sensitive information in diagnostic data such as logs, is a critical concern in the industrial software diagnosis and debugging process.While there are many tools developed to automatically redact the logs for identifying and removing sensitive information, they have severe limitations which can cause either over redaction and loss of critical diagnostic information (false positives), or disclosure of sensitive information (false negatives), or both.To address the problem, in this paper, we argue for a source code analysis approach for log redaction.To identify a log message containing sensitive information, our method locates the corresponding log statement in the source code with logger code augmentation, and checks if the log statement outputs data from sensitive sources by using the data flow graph built from the source code.Appropriate redaction rules are further applied depending on the sensitiveness of the data sources to preserve the privacy information in the logs.We conducted experimental evaluation and comparison with other popular baselines. 0.83The results demonstrate that our approach can significantly improve the detection precision of the sensitive information and reduce both false positives and negatives.

link

2024-09-26

A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation

In this work, we build a simple but strong baseline for sounding video generation.Given base diffusion models for audio and video, we integrate them with additional modules into a single model and train it to make the model jointly generate audio and video.To enhance alignment between audio-video pairs, we introduce two novel mechanisms in our model.The first one is timestep adjustment, which provides different timestep information to each base model.It is designed to align how samples are generated along with timesteps across modalities.The second one is a new design of the additional modules, termed Cross-Modal Conditioning as Positional Encoding (CMC-PE).In CMC-PE, cross-modal information is embedded as if it represents temporal position information, and the embeddings are fed into the model like positional encoding.Compared with the popular cross-attention mechanism, CMC-PE provides a better inductive bias for temporal alignment in the generated data.Experimental results validate the effectiveness of the two newly introduced mechanisms and also demonstrate that our method outperforms existing methods. 0.74

link

2024-09-26

ID$^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition

Synthetic face recognition (SFR) aims to generate synthetic face datasets that mimic the distribution of real face data, which allows for training face recognition models in a privacy-preserving manner.Despite the remarkable potential of diffusion models in image generation, current diffusion-based SFR models struggle with generalization to real-world faces.To address this limitation, we outline three key objectives for SFR: (1) promoting diversity across identities (inter-class diversity), (2) ensuring diversity within each identity by injecting various facial attributes (intra-class diversity), and (3) maintaining identity consistency within each identity group (intra-class identity preservation).Inspired by these goals, we introduce a diffusion-fueled SFR model termed $\text{ID}^3$. $\text{ID}^3$ employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances.Theoretically, we show that minimizing this loss is equivalent to maximizing the lower bound of an adjusted conditional log-likelihood over ID-preserving data.This equivalence motivates an ID-preserving sampling algorithm, which operates over an adjusted gradient vector field, enabling the generation of fake face recognition datasets that approximate the distribution of real-world faces.Extensive experiments across five challenging benchmarks validate the advantages of $\text{ID}^3$. 0.664

link

2024-09-26

A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models

The number of companies listed on the NYSE has been growing exponentially, creating a significant challenge for market analysts, traders, and stockholders who must monitor and assess the performance and strategic shifts of a large number of companies regularly.There is an increasing need for a fast, cost-effective, and comprehensive method to evaluate the performance and detect and compare many companies' strategy changes efficiently. 0.673We propose a novel data-driven approach that leverages large language models (LLMs) to systematically analyze and rate the performance of companies based on their SEC 10-K filings.These filings, which provide detailed annual reports on a company's financial performance and strategic direction, serve as a rich source of data for evaluating various aspects of corporate health, including confidence, environmental sustainability, innovation, and workforce management.We also introduce an automated system for extracting and preprocessing 10-K filings.This system accurately identifies and segments the required sections as outlined by the SEC, while also isolating key textual content that contains critical information about the company.This curated data is then fed into Cohere's Command-R+ LLM to generate quantitative ratings across various performance metrics.These ratings are subsequently processed and visualized to provide actionable insights.The proposed scheme is then implemented on an interactive GUI as a no-code solution for running the data pipeline and creating the visualizations.The application showcases the rating results and provides year-on-year comparisons of company performance. 0.671

link

2024-09-26

Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment

Real-world data distributions are often highly skewed.This has spurred a growing body of research on long-tailed recognition to address this imbalance in training classification models.Among the methods studied, multiplicative logit adjustment (MLA) stands out as a simple and effective method.However, it lacks theoretical guarantees, which raises concerns about the optimality of its adjustment method.We provide a theoretical justification for the effectiveness of MLA with the following two-step theory.First, we develop a theory that adjusts optimal decision boundaries by estimating feature spread on the basis of neural collapse.Then, we demonstrate that MLA approximates this optimal method. 0.668Additionally, through experiments on long-tailed datasets, we illustrate the practical usefulness of MLA under more realistic conditions.We also offer experimental insights to guide the tuning of MLA's hyperparameters.

link

2024-09-26

Multimodal Banking Dataset: Understanding Client Needs through Event Sequences

Financial organizations collect a huge amount of data about clients that typically has a temporal (sequential) structure and is collected from various sources (modalities).Due to privacy issues, there are no large-scale open-source multimodal datasets of event sequences, which significantly limits the research in this area.In this paper, we present the industrial-scale publicly available multimodal banking dataset, MBD, that contains more than 1.5M corporate clients with several modalities: 950M bank transactions, 1B geo position events, 5M embeddings of dialogues with technical support and monthly aggregated purchases of four bank's products.All entries are properly anonymized from real proprietary bank data.Using this dataset, we introduce a novel benchmark with two business tasks: campaigning (purchase prediction in the next month) and matching of clients. 0.641We provide numerical results that demonstrate the superiority of our multi-modal baselines over single-modal techniques for each task.As a result, the proposed dataset can open new perspectives and facilitate the future development of practically important large-scale multimodal algorithms for event sequences. HuggingFace Link: https://huggingface.co/datasets/ai-lab/MBD Github Link: https://github.com/Dzhambo/MBD

link

2024-09-26

Improving Fast Adversarial Training via Self-Knowledge Guidance

Adversarial training has achieved remarkable advancements in defending against adversarial attacks.Among them, fast adversarial training (FAT) is gaining attention for its ability to achieve competitive robustness with fewer computing resources.Existing FAT methods typically employ a uniform strategy that optimizes all training data equally without considering the influence of different examples, which leads to an imbalanced optimization.However, this imbalance remains unexplored in the field of FAT.In this paper, we conduct a comprehensive study of the imbalance issue in FAT and observe an obvious class disparity regarding their performances.This disparity could be embodied from a perspective of alignment between clean and robust accuracy.Based on the analysis, we mainly attribute the observed misalignment and disparity to the imbalanced optimization in FAT, which motivates us to optimize different training data adaptively to enhance robustness.Specifically, we take disparity and misalignment into consideration.First, we introduce self-knowledge guided regularization, which assigns differentiated regularization weights to each class based on its training state, alleviating class disparity.Additionally, we propose self-knowledge guided label relaxation, which adjusts label relaxation according to the training accuracy, alleviating the misalignment and improving robustness.By combining these methods, we formulate the Self-Knowledge Guided FAT (SKG-FAT), leveraging naturally generated knowledge during training to enhance the adversarial robustness without compromising training efficiency.Extensive experiments on four standard datasets demonstrate that the SKG-FAT improves the robustness and preserves competitive clean accuracy, outperforming the state-of-the-art methods. 0.673

link

2024-09-26

Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection

A significant challenge in sound event detection (SED) is the effective utilization of unlabeled data, given the limited availability of labeled data due to high annotation costs.Semi-supervised algorithms rely on labeled data to learn from unlabeled data, and the performance is constrained by the quality and size of the former.In this paper, we introduce the Prototype based Masked Audio Model~(PMAM) algorithm for self-supervised representation learning in SED, to better exploit unlabeled data.Specifically, semantically rich frame-level pseudo labels are constructed from a Gaussian mixture model (GMM) based prototypical distribution modeling.These pseudo labels supervise the learning of a Transformer-based masked audio model, in which binary cross-entropy loss is employed instead of the widely used InfoNCE loss, to provide independent loss contributions from different prototypes, which is important in real scenarios in which multiple labels may apply to unsupervised data frames.A final stage of fine-tuning with just a small amount of labeled data yields a very high performing SED model.On like-for-like tests using the DESED task, our method achieves a PSDS1 score of 62.5\%, surpassing current state-of-the-art models and demonstrating the superiority of the proposed technique. 0.628

link

2024-09-26

Zero- and Few-shot Named Entity Recognition and Text Expansion in Medication Prescriptions using ChatGPT

Introduction: Medication prescriptions are often in free text and include a mix of two languages, local brand names, and a wide range of idiosyncratic formats and abbreviations.Large language models (LLMs) have shown promising ability to generate text in response to input prompts.We use ChatGPT 3.5 to automatically structure and expand medication statements in discharge summaries and thus make them easier to interpret for people and machines.Methods: Named-entity Recognition (NER) and Text Expansion (EX) are used in a zero- and few-shot setting with different prompt strategies.100 medication statements were manually annotated and curated.NER performance was measured by using strict and partial matching. 0.638For the task EX, two experts interpreted the results by assessing semantic equivalence between original and expanded statements.The model performance was measured by precision, recall, and F1 score. 0.637Results: For NER, the best-performing prompt reached an average F1 score of 0.94 in the test set.For EX, the few-shot prompt showed superior performance among other prompts, with an average F1 score of 0.87.Conclusion: Our study demonstrates good performance for NER and EX tasks in free-text medication statements using ChatGPT.Compared to a zero-shot baseline, a few-shot approach prevented the system from hallucinating, which would be unacceptable when processing safety-relevant medication data.

link

2024-09-26

Confidence intervals uncovered: Are we ready for real-world medical imaging AI?

Medical imaging is spearheading the AI transformation of healthcare.Performance reporting is key to determine which methods should be translated into clinical practice. 0.624Frequently, broad conclusions are simply derived from mean performance values.In this paper, we argue that this common practice is often a misleading simplification as it ignores performance variability.Our contribution is threefold.(1) Analyzing all MICCAI segmentation papers (n = 221) published in 2023, we first observe that more than 50\% of papers do not assess performance variability at all.Moreover, only one (0.5\%) paper reported confidence intervals (CIs) for model performance.(2) To address the reporting bottleneck, we show that the unreported standard deviation (SD) in segmentation papers can be approximated by a second-order polynomial function of the mean Dice similarity coefficient (DSC).Based on external validation data from 56 previous MICCAI challenges, we demonstrate that this approximation can accurately reconstruct the CI of a method using information provided in publications. 0.605(3) Finally, we reconstructed 95\% CIs around the mean DSC of MICCAI 2023 segmentation papers.The median CI width was 0.03 which is three times larger than the median performance gap between the first and second ranked method. 0.61For more than 60\% of papers, the mean performance of the second-ranked method was within the CI of the first-ranked method. 0.71We conclude that current publications typically do not provide sufficient evidence to support which models could potentially be translated into clinical practice.

link

2024-09-26

Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification

Deep multimodal learning has shown remarkable success by leveraging contrastive learning to capture explicit one-to-one relations across modalities.However, real-world data often exhibits shared relations beyond simple pairwise associations.We propose M3CoL, a Multimodal Mixup Contrastive Learning approach to capture nuanced shared relations inherent in multimodal data.Our key contribution is a Mixup-based contrastive loss that learns robust representations by aligning mixed samples from one modality with their corresponding samples from other modalities thereby capturing shared relations between them.For multimodal classification tasks, we introduce a framework that integrates a fusion module with unimodal prediction modules for auxiliary supervision during training, complemented by our proposed Mixup-based contrastive loss.Through extensive experiments on diverse datasets (N24News, ROSMAP, BRCA, and Food-101), we demonstrate that M3CoL effectively captures shared multimodal relations and generalizes across domains.It outperforms state-of-the-art methods on N24News, ROSMAP, and BRCA, while achieving comparable performance on Food-101. 0.603Our work highlights the significance of learning shared relations for robust multimodal learning, opening up promising avenues for future research.

link

2024-09-26

A method for identifying causality in the response of nonlinear dynamical systems

Predicting the response of nonlinear dynamical systems subject to random, broadband excitation is important across a range of scientific disciplines, such as structural dynamics and neuroscience.Building data-driven models requires experimental measurements of the system input and output, but it can be difficult to determine whether inaccuracies in the model stem from modelling errors or noise.This paper presents a novel method to identify the causal component of the input-output data from measurements of a system in the presence of output noise, as a function of frequency, without needing a high fidelity model.An output prediction, calculated using an available model, is optimally combined with noisy measurements of the output to predict the input to the system.The parameters of the algorithm balance the two output signals and are utilised to calculate a nonlinear coherence metric as a measure of causality.This method is applicable to a broad class of nonlinear dynamical systems.There are currently no solutions to this problem in the absence of a complete benchmark model. 0.718

link

2024-09-26

Self-supervised Monocular Depth Estimation with Large Kernel Attention

Self-supervised monocular depth estimation has emerged as a promising approach since it does not rely on labeled training data.Most methods combine convolution and Transformer to model long-distance dependencies to estimate depth accurately.However, Transformer treats 2D image features as 1D sequences, and positional encoding somewhat mitigates the loss of spatial information between different feature blocks, tending to overlook channel features, which limit the performance of depth estimation.In this paper, we propose a self-supervised monocular depth estimation network to get finer details.Specifically, we propose a decoder based on large kernel attention, which can model long-distance dependencies without compromising the two-dimension structure of features while maintaining feature channel adaptivity.In addition, we introduce a up-sampling module to accurately recover the fine details in the depth map.Our method achieves competitive results on the KITTI dataset. 0.658

link

2024-09-26

SShaDe: a framework for scalable shape deformation via local representations

With the increase of computational power for the available hardware, the demand for high-resolution data in computer graphics applications increases.Consequently, classical geometry processing techniques based on linear algebra solutions are starting to become obsolete.In this setting, we propose a novel approach for tackling mesh deformation tasks on high-resolution meshes.By reducing the input size with a fast remeshing technique and preserving a consistent representation of the original mesh with local reference frames, we provide a solution that is both scalable and robust.We extensively test our technique and compare it against state-of-the-art methods, proving that our approach can handle meshes with hundreds of thousands of vertices in tens of seconds while still achieving results comparable with the other solutions. 0.606

link

2024-09-26

Self-supervised Pretraining for Cardiovascular Magnetic Resonance Cine Segmentation

Self-supervised pretraining (SSP) has shown promising results in learning from large unlabeled datasets and, thus, could be useful for automated cardiovascular magnetic resonance (CMR) short-axis cine segmentation.However, inconsistent reports of the benefits of SSP for segmentation have made it difficult to apply SSP to CMR.Therefore, this study aimed to evaluate SSP methods for CMR cine segmentation. To this end, short-axis cine stacks of 296 subjects (90618 2D slices) were used for unlabeled pretraining with four SSP methods; SimCLR, positional contrastive learning, DINO, and masked image modeling (MIM).Subsets of varying numbers of subjects were used for supervised fine-tuning of 2D models for each SSP method, as well as to train a 2D baseline model from scratch.The fine-tuned models were compared to the baseline using the 3D Dice similarity coefficient (DSC) in a test dataset of 140 subjects. The SSP methods showed no performance gains with the largest supervised fine-tuning subset compared to the baseline (DSC = 0.89). 0.618When only 10 subjects (231 2D slices) are available for supervised training, SSP using MIM (DSC = 0.86) improves over training from scratch (DSC = 0.82). This study found that SSP is valuable for CMR cine segmentation when labeled training data is scarce, but does not aid state-of-the-art deep learning methods when ample labeled data is available.Moreover, the choice of SSP method is important.The code is publicly available at: https://github.com/q-cardIA/ssp-cmr-cine-segmentation

link

2024-09-26

E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding

Recent advances in Video Large Language Models (Video-LLMs) have demonstrated their great potential in general-purpose video understanding.To verify the significance of these models, a number of benchmarks have been proposed to diagnose their capabilities in different scenarios. 0.69However, existing benchmarks merely evaluate models through video-level question-answering, lacking fine-grained event-level assessment and task diversity.To fill this gap, we introduce E.T. Bench (Event-Level & Time-Sensitive Video Understanding Benchmark), a large-scale and high-quality benchmark for open-ended event-level video understanding.Categorized within a 3-level task taxonomy, E.T. Bench encompasses 7.3K samples under 12 tasks with 7K videos (251.4h total length) under 8 domains, providing comprehensive evaluations.We extensively evaluated 8 Image-LLMs and 12 Video-LLMs on our benchmark, and the results reveal that state-of-the-art models for coarse-level (video-level) understanding struggle to solve our fine-grained tasks, e.g., grounding event-of-interests within videos, largely due to the short video context length, improper time representations, and lack of multi-event training data.Focusing on these issues, we further propose a strong baseline model, E.T. Chat, together with an instruction-tuning dataset E.T. Instruct 164K tailored for fine-grained event-level understanding.Our simple but effective solution demonstrates superior performance in multiple scenarios. 0.736

link

2024-09-26

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising solution to enhance zero-shot generalization in dense prediction tasks.However, existing methods often uncritically use the original diffusion formulation, which may not be optimal due to the fundamental differences between dense prediction and image generation.In this paper, we provide a systemic analysis of the diffusion formulation for the dense prediction, focusing on both quality and efficiency.And we find that the original parameterization type for image generation, which learns to predict noise, is harmful for dense prediction; the multi-step noising/denoising diffusion process is also unnecessary and challenging to optimize.Based on these insights, we introduce Lotus, a diffusion-based visual foundation model with a simple yet effective adaptation protocol for dense prediction.Specifically, Lotus is trained to directly predict annotations instead of noise, thereby avoiding harmful variance.We also reformulate the diffusion process into a single-step procedure, simplifying optimization and significantly boosting inference speed.Additionally, we introduce a novel tuning strategy called detail preserver, which achieves more accurate and fine-grained predictions.Without scaling up the training data or model capacity, Lotus achieves SoTA performance in zero-shot depth and normal estimation across various datasets.It also significantly enhances efficiency, being hundreds of times faster than most existing diffusion-based methods. 0.622

link

2024-09-25

Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation

Despite recent advancements in language and vision modeling, integrating rich multimodal knowledge into recommender systems continues to pose significant challenges.This is primarily due to the need for efficient recommendation, which requires adaptive and interactive responses.In this study, we focus on sequential recommendation and introduce a lightweight framework called full-scale Matryoshka representation learning for multimodal recommendation (fMRLRec).Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions.To integrate item features from diverse modalities, fMRLRec employs a simple mapping to project multimodal item features into an aligned feature space.Additionally, we design an efficient linear transformation that embeds smaller features into larger ones, substantially reducing memory requirements for large-scale training on recommendation data.Combined with improved state space modeling techniques, fMRLRec scales to different dimensions and only requires one-time training to produce multiple models tailored to various granularities.We demonstrate the effectiveness and efficiency of fMRLRec on multiple benchmark datasets, which consistently achieves superior performance over state-of-the-art baseline methods. 0.761

link

2024-09-25

Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts

In this work, we show the pre-trained language models return distinguishable generation probability and uncertainty distribution to unfaithfully hallucinated texts, regardless of their size and structure.By examining 24 models on 6 data sets, we find out that 88-98% of cases return statistically significantly distinguishable generation probability and uncertainty distributions.Using this general phenomenon, we showcase a hallucination-reducing training algorithm.Our algorithm outperforms other baselines by achieving higher faithfulness metrics while maintaining sound general text quality measures. 0.631

link

2024-09-25

Online 6DoF Pose Estimation in Forests using Cross-View Factor Graph Optimisation and Deep Learned Re-localisation

This paper presents a novel approach for robust global localisation and 6DoF pose estimation of ground robots in forest environments by leveraging cross-view factor graph optimisation and deep-learned re-localisation.The proposed method addresses the challenges of aligning aerial and ground data for pose estimation, which is crucial for accurate point-to-point navigation in GPS-denied environments.By integrating information from both perspectives into a factor graph framework, our approach effectively estimates the robot's global position and orientation.We validate the performance of our method through extensive experiments in diverse forest scenarios, demonstrating its superiority over existing baselines in terms of accuracy and robustness in these challenging environments. 0.743Experimental results show that our proposed localisation system can achieve drift-free localisation with bounded positioning errors, ensuring reliable and safe robot navigation under canopies.

link

2024-09-25

A Multi-Dataset Classification-Based Deep Learning Framework for Electronic Health Records and Predictive Analysis in Healthcare

In contemporary healthcare, to protect patient data, electronic health records have become invaluable repositories, creating vast opportunities to leverage deep learning techniques for predictive analysis.Retinal fundus images, cirrhosis stages, and heart disease diagnostic predictions have shown promising results through the integration of deep learning techniques for classifying diverse datasets.This study proposes a novel deep learning predictive analysis framework for classifying multiple datasets by pre-processing data from three distinct sources.A hybrid deep learning model combining Residual Networks and Artificial Neural Networks is proposed to detect acute and chronic diseases such as heart diseases, cirrhosis, and retinal conditions, outperforming existing models.Dataset preparation involves aspects such as categorical data transformation, dimensionality reduction, and missing data synthesis.Feature extraction is effectively performed using scaler transformation for categorical datasets and ResNet architecture for image datasets.The resulting features are integrated into a unified classification model.Rigorous experimentation and evaluation resulted in high accuracies of 93%, 99%, and 95% for retinal fundus images, cirrhosis stages, and heart disease diagnostic predictions, respectively.The efficacy of the proposed method is demonstrated through a detailed analysis of F1-score, precision, and recall metrics. 0.712This study offers a comprehensive exploration of methodologies and experiments, providing in-depth knowledge of deep learning predictive analysis in electronic health records.

link

2024-09-25

Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics

Explainable AI (XAI) is a rapidly growing domain with a myriad of proposed methods as well as metrics aiming to evaluate their efficacy.However, current studies are often of limited scope, examining only a handful of XAI methods and ignoring underlying design parameters for performance, such as the model architecture or the nature of input data.Moreover, they often rely on one or a few metrics and neglect thorough validation, increasing the risk of selection bias and ignoring discrepancies among metrics.These shortcomings leave practitioners confused about which method to choose for their problem.In response, we introduce LATEC, a large-scale benchmark that critically evaluates 17 prominent XAI methods using 20 distinct metrics. 0.61We systematically incorporate vital design parameters like varied architectures and diverse input modalities, resulting in 7,560 examined combinations.Through LATEC, we showcase the high risk of conflicting metrics leading to unreliable rankings and consequently propose a more robust evaluation scheme. 0.643Further, we comprehensively evaluate various XAI methods to assist practitioners in selecting appropriate methods aligning with their needs.Curiously, the emerging top-performing method, Expected Gradients, is not examined in any relevant related study.LATEC reinforces its role in future XAI research by publicly releasing all 326k saliency maps and 378k metric scores as a (meta-)evaluation dataset.

link

2024-09-25

MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features

This paper presents a benchmark dataset for aligning lecture videos with corresponding slides and introduces a novel multimodal algorithm leveraging features from speech, text, and images.It achieves an average accuracy of 0.82 in comparison to SIFT (0.56) while being approximately 11 times faster. 0.643Using dynamic programming the algorithm tries to determine the optimal slide sequence.The results show that penalizing slide transitions increases accuracy.Features obtained via optical character recognition (OCR) contribute the most to a high matching accuracy, followed by image features.The findings highlight that audio transcripts alone provide valuable information for alignment and are beneficial if OCR data is lacking.Variations in matching accuracy across different lectures highlight the challenges associated with video quality and lecture style.The novel multimodal algorithm demonstrates robustness to some of these challenges, underscoring the potential of the approach.

link

2024-09-25

Large Language Model Predicts Above Normal All India Summer Monsoon Rainfall in 2024

Reliable prediction of the All India Summer Monsoon Rainfall (AISMR) is pivotal for informed policymaking for the country, impacting the lives of billions of people.However, accurate simulation of AISMR has been a persistent challenge due to the complex interplay of various muti-scale factors and the inherent variability of the monsoon system.This research focuses on adapting and fine-tuning the latest LLM model, PatchTST, to accurately predict AISMR with a lead time of three months.The fine-tuned PatchTST model, trained with historical AISMR data, the Ni\~no3.4 index, and categorical Indian Ocean Dipole values, outperforms several popular neural network models and statistical models.This fine-tuned LLM model exhibits an exceptionally low RMSE percentage of 0.07% and a Spearman correlation of 0.976.This is particularly impressive, since it is nearly 80% more accurate than the best-performing NN models. 0.614The model predicts an above-normal monsoon for the year 2024, with an accumulated rainfall of 921.6 mm in the month of June-September for the entire country.

link

2024-09-25

CodeInsight: A Curated Dataset of Practical Coding Solutions from Stack Overflow

We introduce a novel dataset tailored for code generation, aimed at aiding developers in common tasks.Our dataset provides examples that include a clarified intent, code snippets associated, and an average of three related unit tests.It encompasses a range of libraries such as \texttt{Pandas}, \texttt{Numpy}, and \texttt{Regex}, along with more than 70 standard libraries in Python code derived from Stack Overflow.Comprising 3,409 crafted examples by Python experts, our dataset is designed for both model finetuning and standalone evaluation.To complete unit tests evaluation, we categorize examples in order to get more fine grained analysis, enhancing the understanding of models' strengths and weaknesses in specific coding tasks.The examples have been refined to reduce data contamination, a process confirmed by the performance of three leading models: Mistral 7B, CodeLLaMa 13B, andStarcoder 15B. We further investigate data-contamination testing GPT-4 performance on a part of our dataset.The benchmark can be accessed at \url{https://github.com/NathanaelBeau/CodeInsight}. 0.703

link

2024-09-25

Shifting from endangerment to rebirth in the Artificial Intelligence Age: An Ensemble Machine Learning Approach for Hawrami Text Classification

Hawrami, a dialect of Kurdish, is classified as an endangered language as it suffers from the scarcity of data and the gradual loss of its speakers.Natural Language Processing projects can be used to partially compensate for data availability for endangered languages/dialects through a variety of approaches, such as machine translation, language model building, and corpora development.Similarly, NLP projects such as text classification are in language documentation.Several text classification studies have been conducted for Kurdish, but they were mainly dedicated to two particular dialects:Sorani (Central Kurdish) and Kurmanji (Northern Kurdish).In this paper, we introduce various text classification models using a dataset of 6,854 articles in Hawrami labeled into 15 categories by two native speakers.We use K-nearest Neighbor (KNN), Linear Support Vector Machine (Linear SVM), Logistic Regression (LR), and Decision Tree (DT) to evaluate how well those methods perform the classification task.The results indicate that the Linear SVM achieves a 96% of accuracy and outperforms the other approaches. 0.612

link

2024-09-25

DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

In this paper, we present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM) to tackle the challenges inherent in data-scarce scenarios.Recently, DMs have opened up the possibility of generating synthetic images to complement a few training images.However, increasing the diversity of synthetic images also raises the risk of generating samples outside the target distribution.Our approach addresses this issue by embedding novel semantic information into text prompts via LLM and utilizing real images as visual prompts, thus generating semantically rich images.To ensure that the generated images remain within the target distribution, we dynamically adjust the guidance weight based on each image's CLIPScore to control the diversity.Experimental results show that our method produces synthetic images with enhanced diversity while maintaining adherence to the target distribution.Consequently, our approach proves to be more efficient in the few-shot setting on several benchmarks. 0.705Our code is available at https://github.com/kkyuhun94/dalda .

link

2024-09-25

ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis

Volunteer Geographic Information (VGI), with its rich variety, large volume, rapid updates, and diverse sources, has become a critical source of geospatial data.However, VGI data from platforms like OSM exhibit significant quality heterogeneity across different data types, particularly with urban building data.To address this, we propose a multi-source geographic data transformation solution, utilizing accessible and complete VGI data to assist in generating urban building footprint data.We also employ a multimodal data generation framework to improve accuracy.First, we introduce a pipeline for constructing an 'image-text-metadata-building footprint' dataset, primarily based on road network data and supplemented by other multimodal data.We then present ControlCity, a geographic data transformation method based on a multimodal diffusion model.This method first uses a pre-trained text-to-image model to align text, metadata, and building footprint data.An improved ControlNet further integrates road network and land-use imagery, producing refined building footprint data.Experiments across 22 global cities demonstrate that ControlCity successfully simulates real urban building patterns, achieving state-of-the-art performance.Specifically, our method achieves an average FID score of 50.94, reducing error by 71.01% compared to leading methods, and a MIoU score of 0.36, an improvement of 38.46%. 0.675Additionally, our model excels in tasks like urban morphology transfer, zero-shot city generation, and spatial data completeness assessment.In the zero-shot city task, our method accurately predicts and generates similar urban structures, demonstrating strong generalization.This study confirms the effectiveness of our approach in generating urban building footprint data and capturing complex city characteristics.

link

2024-09-25

Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors

Diffusion-based image super-resolution (SR) methods have achieved remarkable success by leveraging large pre-trained text-to-image diffusion models as priors.However, these methods still face two challenges: the requirement for dozens of sampling steps to achieve satisfactory results, which limits efficiency in real scenarios, and the neglect of degradation models, which are critical auxiliary information in solving the SR problem.In this work, we introduced a novel one-step SR model, which significantly addresses the efficiency issue of diffusion-based SR methods.Unlike existing fine-tuning strategies, we designed a degradation-guided Low-Rank Adaptation (LoRA) module specifically for SR, which corrects the model parameters based on the pre-estimated degradation information from low-resolution images.This module not only facilitates a powerful data-dependent or degradation-dependent SR model but also preserves the generative prior of the pre-trained diffusion model as much as possible.Furthermore, we tailor a novel training pipeline by introducing an online negative sample generation strategy.Combined with the classifier-free guidance strategy during inference, it largely improves the perceptual quality of the super-resolution results.Extensive experiments have demonstrated the superior efficiency and effectiveness of the proposed model compared to recent state-of-the-art methods. 0.789

link

2024-09-25

Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts

Prototyping complex computer-aided design (CAD) models in modern softwares can be very time-consuming.This is due to the lack of intelligent systems that can quickly generate simpler intermediate parts.We propose Text2CAD, the first AI framework for generating text-to-parametric CAD models using designer-friendly instructions for all skill levels.Furthermore, we introduce a data annotation pipeline for generating text prompts based on natural language instructions for the DeepCAD dataset using Mistral and LLaVA-NeXT.The dataset contains $\sim170$K models and $\sim660$K text annotations, from abstract CAD descriptions (e.g., generate two concentric cylinders) to detailed specifications (e.g., draw two circles with center $(x,y)$ and radius $r_{1}$, $r_{2}$, and extrude along the normal by $d$...).Within the Text2CAD framework, we propose an end-to-end transformer-based auto-regressive network to generate parametric CAD models from input texts.We evaluate the performance of our model through a mixture of metrics, including visual quality, parametric precision, and geometrical accuracy. 0.703Our proposed framework shows great potential in AI-aided design applications.Our source code and annotations will be publicly available.

link

2024-09-24

Provably Efficient Exploration in Inverse Constrained Reinforcement Learning

To obtain the optimal constraints in complex environments, Inverse Constrained Reinforcement Learning (ICRL) seeks to recover these constraints from expert demonstrations in a data-driven manner.Existing ICRL algorithms collect training samples from an interactive environment.However, the efficacy and efficiency of these sampling strategies remain unknown.To bridge this gap, we introduce a strategic exploration framework with provable efficiency.Specifically, we define a feasible constraint set for ICRL problems and investigate how expert policy and environmental dynamics influence the optimality of constraints.Motivated by our findings, we propose two exploratory algorithms to achieve efficient constraint inference via 1) dynamically reducing the bounded aggregate error of cost estimation and 2) strategically constraining the exploration policy.Both algorithms are theoretically grounded with tractable sample complexity.We empirically demonstrate the performance of our algorithms under various environments. 0.667

link

2024-09-24

AutoCE: An Accurate and Efficient Model Advisor for Learned Cardinality Estimation

Cardinality estimation (CE) plays a crucial role in many database-related tasks such as query generation, cost estimation, and join ordering.Lately, we have witnessed the emergence of numerous learned CE models.However, no single CE model is invincible when it comes to the datasets with various data distributions.To facilitate data-intensive applications with accurate and efficient cardinality estimation, it is important to have an approach that can judiciously and efficiently select the most suitable CE model for an arbitrary dataset. In this paper, we study a new problem of selecting the best CE models for a variety of datasets.This problem is rather challenging as it is hard to capture the relationship from various datasets to the performance of disparate models.To address this problem, we propose a model advisor, named AutoCE, which can adaptively select the best model for a dataset.The main contribution of AutoCE is the learning-based model selection, where deep metric learning is used to learn a recommendation model and incremental learning is proposed to reduce the labeling overhead and improve the model robustness.We have integrated AutoCE into PostgreSQL and evaluated its impact on query optimization.The results showed that AutoCE achieved the best performance (27% better) and outperformed the baselines concerning accuracy (2.1 times better) and efficacy (4.2 times better). 0.623

link

2024-09-24

Learning with Confidence: Training Better Classifiers from Soft Labels

In supervised machine learning, models are typically trained using data with hard labels, i.e., definite assignments of class membership.This traditional approach, however, does not take the inherent uncertainty in these labels into account.We investigate whether incorporating label uncertainty, represented as discrete probability distributions over the class labels -- known as soft labels -- improves the predictive performance of classification models.We first demonstrate the potential value of soft label learning (SLL) for estimating model parameters in a simulation experiment, particularly for limited sample sizes and imbalanced data.Subsequently, we compare the performance of various wrapper methods for learning from both hard and soft labels using identical base classifiers.On real-world-inspired synthetic data with clean labels, the SLL methods consistently outperform hard label methods.Since real-world data is often noisy and precise soft labels are challenging to obtain, we study the effect that noisy probability estimates have on model performance.Alongside conventional noise models, our study examines four types of miscalibration that are known to affect human annotators.The results show that SLL methods outperform the hard label methods in the majority of settings. 0.669Finally, we evaluate the methods on a real-world dataset with confidence scores, where the SLL methods are shown to match the traditional methods for predicting the (noisy) hard labels while providing more accurate confidence estimates.

link

2024-09-24

Leveraging Mixture of Experts for Improved Speech Deepfake Detection

Speech deepfakes pose a significant threat to personal security and content authenticity.Several detectors have been proposed in the literature, and one of the primary challenges these systems have to face is the generalization over unseen data to identify fake signals across a wide range of datasets.In this paper, we introduce a novel approach for enhancing speech deepfake detection performance using a Mixture of Experts architecture.The Mixture of Experts framework is well-suited for the speech deepfake detection task due to its ability to specialize in different input types and handle data variability efficiently.This approach offers superior generalization and adaptability to unseen data compared to traditional single models or ensemble methods.Additionally, its modular structure supports scalable updates, making it more flexible in managing the evolving complexity of deepfake techniques while maintaining high detection accuracy.We propose an efficient, lightweight gating mechanism to dynamically assign expert weights for each input, optimizing detection performance.Experimental results across multiple datasets demonstrate the effectiveness and potential of our proposed approach. 0.68

link

2024-09-24

Implicit assessment of language learning during practice as accurate as explicit testing

Assessment of proficiency of the learner is an essential part of Intelligent Tutoring Systems (ITS).We use Item Response Theory (IRT) in computer-aided language learning for assessment of student ability in two contexts: in test sessions, and in exercises during practice sessions.Exhaustive testing across a wide range of skills can provide a detailed picture of proficiency, but may be undesirable for a number of reasons.Therefore, we first aim to replace exhaustive tests with efficient but accurate adaptive tests. 0.612We use learner data collected from exhaustive tests under imperfect conditions, to train an IRT model to guide adaptive tests.Simulations and experiments with real learner data confirm that this approach is efficient and accurate.Second, we explore whether we can accurately estimate learner ability directly from the context of practice with exercises, without testing.We transform learner data collected from exercise sessions into a form that can be used for IRT modeling.This is done by linking the exercises to {\em linguistic constructs}; the constructs are then treated as "items" within IRT.We present results from large-scale studies with thousands of learners.Using teacher assessments of student ability as "ground truth," we compare the estimates obtained from tests vs. those from exercises.The experiments confirm that the IRT models can produce accurate ability estimation based on exercises.

link

2024-09-24

The anonymization problem in social networks

In this paper we introduce a general version of the anonymization problem in social networks, in which the goal is to maximize the number of anonymous nodes by altering a given graph.We define three variants of this optimization problem, being full, partial and budgeted anonymization.In each, the objective is to maximize the number of k-anonymous nodes, i.e., nodes for which there are at least k-1 equivalent nodes, according to a particular anonymity measure of structural node equivalence.We propose six new heuristic algorithms for solving the anonymization problem which we implement into the reusable ANO-NET computational framework.As a baseline, we use an edge sampling method introduced in previous work. 0.648Experiments on both graph models and 17 real-world network datasets result in three empirical findings.First, we demonstrate that edge deletion is the most effective graph alteration operation.Second, we compare four commonly used anonymity measures from the literature and highlight how the choice of anonymity measure has a tremendous effect on both the achieved anonymity as well as the difficulty of solving the anonymization problem.Third, we find that the proposed algorithms that preferentially delete edges with a larger effect on nodes at a structurally unique position consistently outperform heuristics solely based on network structure.With similar runtimes, our algorithms retain on average 17 times more edges, ensuring higher data utility after full anonymization.In the budgeted variant, they achieve 4.4 times more anonymous nodes than the baseline.This work lays important foundations for future development of algorithms for anonymizing social networks.

link

2024-09-24

CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data

Online education platforms have significantly transformed the dissemination of educational resources by providing a dynamic and digital infrastructure.With the further enhancement of this transformation, the advent of Large Language Models (LLMs) has elevated the intelligence levels of these platforms.However, current academic benchmarks provide limited guidance for real-world industry scenarios.This limitation arises because educational applications require more than mere test question responses.To bridge this gap, we introduce CJEval, a benchmark based on Chinese Junior High School Exam Evaluations. 0.618CJEval consists of 26,136 samples across four application-level educational tasks covering ten subjects.These samples include not only questions and answers but also detailed annotations such as question types, difficulty levels, knowledge concepts, and answer explanations.By utilizing this benchmark, we assessed LLMs' potential applications and conducted a comprehensive analysis of their performance by fine-tuning on various educational tasks.Extensive experiments and discussions have highlighted the opportunities and challenges of applying LLMs in the field of education.

link

2024-09-24

Fast Extrinsic Calibration for Multiple Inertial Measurement Units in Visual-Inertial System

In this paper, we propose a fast extrinsic calibration method for fusing multiple inertial measurement units (MIMU) to improve visual-inertial odometry (VIO) localization accuracy.Currently, data fusion algorithms for MIMU highly depend on the number of inertial sensors.Based on the assumption that extrinsic parameters between inertial sensors are perfectly calibrated, the fusion algorithm provides better localization accuracy with more IMUs, while neglecting the effect of extrinsic calibration error.Our method builds two non-linear least-squares problems to estimate the MIMU relative position and orientation separately, independent of external sensors and inertial noises online estimation.Then we give the general form of the virtual IMU (VIMU) method and propose its propagation on manifold.We perform our method on datasets, our self-made sensor board, and board with different IMUs, validating the superiority of our method over competing methods concerning speed, accuracy, and robustness. 0.624In the simulation experiment, we show that only fusing two IMUs with our calibration method to predict motion can rival nine IMUs.Real-world experiments demonstrate better localization accuracy of the VIO integrated with our calibration method and VIMU propagation on manifold.

link

Developer Research

2024-09-25

APILOT: Navigating Large Language Models to Generate Secure Code by Sidestepping Outdated API Pitfalls

With the rapid development of large language models (LLMs), their applications have expanded into diverse fields, such as code assistance.However, the substantial size of LLMs makes their training highly resource- and time-intensive, rendering frequent retraining or updates impractical.Consequently, time-sensitive data can become outdated, potentially misleading LLMs in time-aware tasks.For example, new vulnerabilities are discovered in various programs every day.Without updating their knowledge, LLMs may inadvertently generate code that includes these newly discovered vulnerabilities.Current strategies, such as prompt engineering and fine-tuning, do not effectively address this issue. To address this issue, we propose solution, named APILOT, which maintains a realtime, quickly updatable dataset of outdated APIs.Additionally, APILOT utilizes an augmented generation method that leverages this dataset to navigate LLMs in generating secure, version-aware code.We conducted a comprehensive evaluation to measure the effectiveness of APILOT in reducing the incidence of outdated API recommendations across seven different state-of-the-art LLMs.The evaluation results indicate that APILOT can reduce outdated code recommendations by 89.42% on average with limited performance overhead.Interestingly, while enhancing security, APILOT also improves the usability of the code generated by LLMs, showing an average increase of 27.54% in usability.This underscores APILOT's dual capability to enhance both the safety and practical utility of code suggestions in contemporary software development environments. 0.682

link

2024-09-25

CodeInsight: A Curated Dataset of Practical Coding Solutions from Stack Overflow

We introduce a novel dataset tailored for code generation, aimed at aiding developers in common tasks. 0.617Our dataset provides examples that include a clarified intent, code snippets associated, and an average of three related unit tests.It encompasses a range of libraries such as \texttt{Pandas}, \texttt{Numpy}, and \texttt{Regex}, along with more than 70 standard libraries in Python code derived from Stack Overflow.Comprising 3,409 crafted examples by Python experts, our dataset is designed for both model finetuning and standalone evaluation.To complete unit tests evaluation, we categorize examples in order to get more fine grained analysis, enhancing the understanding of models' strengths and weaknesses in specific coding tasks.The examples have been refined to reduce data contamination, a process confirmed by the performance of three leading models: Mistral 7B, CodeLLaMa 13B, andStarcoder 15B. We further investigate data-contamination testing GPT-4 performance on a part of our dataset.The benchmark can be accessed at \url{https://github.com/NathanaelBeau/CodeInsight}.

link

2024-09-17

Leveraging Reviewer Experience in Code Review Comment Generation

Modern code review is a ubiquitous software quality assurance process aimed at identifying potential issues within newly written code. 0.65Despite its effectiveness, the process demands large amounts of effort from the human reviewers involved.To help alleviate this workload, researchers have trained deep learning models to imitate human reviewers in providing natural language code reviews.Formally, this task is known as code review comment generation.Prior work has demonstrated improvements in this task by leveraging machine learning techniques and neural models, such as transfer learning and the transformer architecture.However, the quality of the model generated reviews remain sub-optimal due to the quality of the open-source code review data used in model training.This is in part due to the data obtained from open-source projects where code reviews are conducted in a public forum, and reviewers possess varying levels of software development experience, potentially affecting the quality of their feedback.To accommodate for this variation, we propose a suite of experience-aware training methods that utilise the reviewers' past authoring and reviewing experiences as signals for review quality.Specifically, we propose experience-aware loss functions (ELF), which use the reviewers' authoring and reviewing ownership of a project as weights in the model's loss function.Through this method, experienced reviewers' code reviews yield larger influence over the model's behaviour.Compared to the SOTA model, ELF was able to generate higher quality reviews in terms of accuracy, informativeness, and comment types generated.The key contribution of this work is the demonstration of how traditional software engineering concepts such as reviewer experience can be integrated into the design of AI-based automated code review models.

link

Data Annotation Techniques

Causality Research

2024-09-26

From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection

This paper introduces a novel approach to enhance time series forecasting using Large Language Models (LLMs) and Generative Agents.With language as a medium, our method adaptively integrates various social events into forecasting models, aligning news content with time series fluctuations for enriched insights.Specifically, we utilize LLM-based agents to iteratively filter out irrelevant news and employ human-like reasoning and reflection to evaluate predictions.This enables our model to analyze complex events, such as unexpected incidents and shifts in social behavior, and continuously refine the selection logic of news and the robustness of the agent's output. 0.514By compiling selected news with time series data, we fine-tune the LLaMa2 pre-trained model.The results demonstrate significant improvements in forecasting accuracy and suggest a potential paradigm shift in time series forecasting by effectively harnessing unstructured news data.

link

2024-09-26

Dataset Distillation-based Hybrid Federated Learning on Non-IID Data

In federated learning, the heterogeneity of client data has a great impact on the performance of model training.Many heterogeneity issues in this process are raised by non-independently and identically distributed (Non-IID) data. 0.564This study focuses on the issue of label distribution skew.To address it, we propose a hybrid federated learning framework called HFLDD, which integrates dataset distillation to generate approximately independent and equally distributed (IID) data, thereby improving the performance of model training.Particularly, we partition the clients into heterogeneous clusters, where the data labels among different clients within a cluster are unbalanced while the data labels among different clusters are balanced.The cluster headers collect distilled data from the corresponding cluster members, and conduct model training in collaboration with the server.This training process is like traditional federated learning on IID data, and hence effectively alleviates the impact of Non-IID data on model training.Furthermore, we compare our proposed method with typical baseline methods on public datasets.Experimental results demonstrate that when the data labels are severely imbalanced, the proposed HFLDD outperforms the baseline methods in terms of both test accuracy and communication cost.

link

2024-09-26

Modulated Intervention Preference Optimization (MIPO): Keey the Easy, Refine the Difficult

Preference optimization methods typically begin training with a well-trained SFT model as a reference model.In RLHF and DPO, a regularization term is used during the preference optimization process to prevent the policy model from deviating too far from the reference model's distribution, thereby avoiding the generation of anomalous responses.When the reference model is already well-aligned with the given data or only requires slight adjustments, this approach can produce a well-aligned model.However, if the reference model is not aligned with the given data and requires significant deviation from its current state, a regularization term may actually hinder the model alignment.In this study, we propose \textbf{Modulated Intervention Preference Optimization (MIPO)} to address this issue.MIPO modulates the degree of intervention from the reference model based on how well the given data is aligned with it. 0.601If the data is well-aligned, the intervention is increased to prevent the policy model from diverging significantly from reference model. 0.568Conversely, if the alignment is poor, the interference is reduced to facilitate more extensive training.We compare the performance of MIPO and DPO using Mistral-7B and Llama3-8B in Alpaca Eval 2.0 and MT-Bench.The experimental results demonstrate that MIPO consistently outperforms DPO across various evaluation scenarios.

link

2024-09-26

Dynamic Subframe Splitting and Spatio-Temporal Motion Entangled Sparse Attention for RGB-E Tracking

Event-based bionic camera asynchronously captures dynamic scenes with high temporal resolution and high dynamic range, offering potential for the integration of events and RGB under conditions of illumination degradation and fast motion.Existing RGB-E tracking methods model event characteristics utilising attention mechanism of Transformer before integrating both modalities.Nevertheless, these methods involve aggregating the event stream into a single event frame, lacking the utilisation of the temporal information inherent in the event stream. 0.61Moreover, the traditional attention mechanism is well-suited for dense semantic features, while the attention mechanism for sparse event features require revolution.In this paper, we propose a dynamic event subframe splitting strategy to split the event stream into more fine-grained event clusters, aiming to capture spatio-temporal features that contain motion cues.Based on this, we design an event-based sparse attention mechanism to enhance the interaction of event features in temporal and spatial dimensions.The experimental results indicate that our method outperforms existing state-of-the-art methods on the FE240 and COESOT datasets, providing an effective processing manner for the event data.

link

2024-09-26

Multimodal Banking Dataset: Understanding Client Needs through Event Sequences

Financial organizations collect a huge amount of data about clients that typically has a temporal (sequential) structure and is collected from various sources (modalities). 0.512Due to privacy issues, there are no large-scale open-source multimodal datasets of event sequences, which significantly limits the research in this area.In this paper, we present the industrial-scale publicly available multimodal banking dataset, MBD, that contains more than 1.5M corporate clients with several modalities: 950M bank transactions, 1B geo position events, 5M embeddings of dialogues with technical support and monthly aggregated purchases of four bank's products.All entries are properly anonymized from real proprietary bank data.Using this dataset, we introduce a novel benchmark with two business tasks: campaigning (purchase prediction in the next month) and matching of clients.We provide numerical results that demonstrate the superiority of our multi-modal baselines over single-modal techniques for each task.As a result, the proposed dataset can open new perspectives and facilitate the future development of practically important large-scale multimodal algorithms for event sequences. HuggingFace Link: https://huggingface.co/datasets/ai-lab/MBD Github Link: https://github.com/Dzhambo/MBD

link

2024-09-26

Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment

The sharp increase in data-related expenses has motivated research into condensing datasets while retaining the most informative features. 0.519Dataset distillation has thus recently come to the fore.This paradigm generates synthetic dataset that are representative enough to replace the original dataset in training a neural network.To avoid redundancy in these synthetic datasets, it is crucial that each element contains unique features and remains diverse from others during the synthesis stage.In this paper, we provide a thorough theoretical and empirical analysis of diversity within synthesized datasets.We argue that enhancing diversity can improve the parallelizable yet isolated synthesizing approach.Specifically, we introduce a novel method that employs dynamic and directed weight adjustment techniques to modulate the synthesis process, thereby maximizing the representativeness and diversity of each synthetic instance.Our method ensures that each batch of synthetic data mirrors the characteristics of a large, varying subset of the original dataset.Extensive experiments across multiple datasets, including CIFAR, Tiny-ImageNet, and ImageNet-1K, demonstrate the superior performance of our method, highlighting its effectiveness in producing diverse and representative synthetic datasets with minimal computational expense.

link

2024-09-26

Preserving logical and functional dependencies in synthetic tabular data

Dependencies among attributes are a common aspect of tabular data.However, whether existing tabular data generation algorithms preserve these dependencies while generating synthetic data is yet to be explored.In addition to the existing notion of functional dependencies, we introduce the notion of logical dependencies among the attributes in this article.Moreover, we provide a measure to quantify logical dependencies among attributes in tabular data. 0.53Utilizing this measure, we compare several state-of-the-art synthetic data generation algorithms and test their capability to preserve logical and functional dependencies on several publicly available datasets.We demonstrate that currently available synthetic tabular data generation algorithms do not fully preserve functional dependencies when they generate synthetic datasets.In addition, we also showed that some tabular synthetic data generation models can preserve inter-attribute logical dependencies.Our review and comparison of the state-of-the-art reveal research needs and opportunities to develop task-specific synthetic tabular data generation models.

link

2024-09-26

Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification

Deep multimodal learning has shown remarkable success by leveraging contrastive learning to capture explicit one-to-one relations across modalities.However, real-world data often exhibits shared relations beyond simple pairwise associations. 0.56We propose M3CoL, a Multimodal Mixup Contrastive Learning approach to capture nuanced shared relations inherent in multimodal data.Our key contribution is a Mixup-based contrastive loss that learns robust representations by aligning mixed samples from one modality with their corresponding samples from other modalities thereby capturing shared relations between them.For multimodal classification tasks, we introduce a framework that integrates a fusion module with unimodal prediction modules for auxiliary supervision during training, complemented by our proposed Mixup-based contrastive loss.Through extensive experiments on diverse datasets (N24News, ROSMAP, BRCA, and Food-101), we demonstrate that M3CoL effectively captures shared multimodal relations and generalizes across domains.It outperforms state-of-the-art methods on N24News, ROSMAP, and BRCA, while achieving comparable performance on Food-101.Our work highlights the significance of learning shared relations for robust multimodal learning, opening up promising avenues for future research.

link

2024-09-26

Detecting and Measuring Confounding Using Causal Mechanism Shifts

Detecting and measuring confounding effects from data is a key challenge in causal inference. 0.82Existing methods frequently assume causal sufficiency, disregarding the presence of unobserved confounding variables. 0.838Causal sufficiency is both unrealistic and empirically untestable. 0.782Additionally, existing methods make strong parametric assumptions about the underlying causal generative process to guarantee the identifiability of confounding variables. 0.814Relaxing the causal sufficiency and parametric assumptions and leveraging recent advancements in causal discovery and confounding analysis with non-i.i.d. data, we propose a comprehensive approach for detecting and measuring confounding. 0.875We consider various definitions of confounding and introduce tailored methodologies to achieve three objectives: (i) detecting and measuring confounding among a set of variables, (ii) separating observed and unobserved confounding effects, and (iii) understanding the relative strengths of confounding bias between different sets of variables. 0.624We present useful properties of a confounding measure and present measures that satisfy those properties.Empirical results support the theoretical analysis.

link

2024-09-26

A method for identifying causality in the response of nonlinear dynamical systems

Predicting the response of nonlinear dynamical systems subject to random, broadband excitation is important across a range of scientific disciplines, such as structural dynamics and neuroscience.Building data-driven models requires experimental measurements of the system input and output, but it can be difficult to determine whether inaccuracies in the model stem from modelling errors or noise.This paper presents a novel method to identify the causal component of the input-output data from measurements of a system in the presence of output noise, as a function of frequency, without needing a high fidelity model. 0.733An output prediction, calculated using an available model, is optimally combined with noisy measurements of the output to predict the input to the system.The parameters of the algorithm balance the two output signals and are utilised to calculate a nonlinear coherence metric as a measure of causality. 0.628This method is applicable to a broad class of nonlinear dynamical systems.There are currently no solutions to this problem in the absence of a complete benchmark model.

link

2024-09-26

Adaptive Stream Processing on Edge Devices through Active Inference

The current scenario of IoT is witnessing a constant increase on the volume of data, which is generated in constant stream, calling for novel architectural and logical solutions for processing it.Moving the data handling towards the edge of the computing spectrum guarantees better distribution of load and, in principle, lower latency and better privacy.However, managing such a structure is complex, especially when requirements, also referred to Service Level Objectives (SLOs), specified by applications' owners and infrastructure managers need to be ensured.Despite the rich number of proposals of Machine Learning (ML) based management solutions, researchers and practitioners yet struggle to guarantee long-term prediction and control, and accurate troubleshooting.Therefore, we present a novel ML paradigm based on Active Inference (AIF) -- a concept from neuroscience that describes how the brain constantly predicts and evaluates sensory information to decrease long-term surprise.We implement it and evaluate it in a heterogeneous real stream processing use case, where an AIF-based agent continuously optimizes the fulfillment of three SLOs for three autonomous driving services running on multiple devices.The agent used causal knowledge to gradually develop an understanding of how its actions are related to requirements fulfillment, and which configurations to favor. 0.674Through this approach, our agent requires up to thirty iterations to converge to the optimal solution, showing the capability of offering accurate results in a short amount of time.Furthermore, thanks to AIF and its causal structures, our method guarantees full transparency on the decision making, making the interpretation of the results and the troubleshooting effortless. 0.679

link

2024-09-26

Enhancing elusive clues in knowledge learning by contrasting attention of language models

Causal language models acquire vast amount of knowledge from general text corpus during pretraining, but the efficiency of knowledge learning is known to be unsatisfactory, especially when learning from knowledge-dense and small-sized corpora. 0.672The deficiency can come from long-distance dependencies which are hard to capture by language models, and overfitting to co-occurrence patterns and distracting clues in the training text.To address these issues, the paper proposes a method to enhance knowledge learning during language model pretraining, by enhancing elusive but important clues in text discovered by the language model themselves.We found that larger language models pay more attention to non-obvious but important clues, which are often overlooked by smaller language models.Therefore, we can identify these clues by contrasting the attention weights of large and small language models.We use the identified clues as a guide to perform token-dropout data augmentation on the training text, and observed a significant boost in both small and large models' performance in fact memorization.This shows that the behavior contrast between more and less-performant language models contains important clues for knowledge learning, and it can be ``amplified" for a straight-forward improvement in knowledge learning efficiency.

link

2024-09-25

Cyber Food Swamps: Investigating the Impacts of Online-to-Offline Food Delivery Platforms on Healthy Food Choices

Online-to-offline (O2O) food delivery platforms have substantially enriched the food choices of urban residents by allowing them to conveniently access farther food outlets.However, concerns about the healthiness of delivered food persist, especially because the impact of O2O food delivery platforms on users' healthy food choices remains unclear.This study leverages large-scale empirical data from a leading O2O delivery platform to comprehensively analyze online food choice behaviors and how they are influenced by the online exposure to fast food restaurants, i.e., online food environment.Our analyses reveal significant discrepancy in food preferences across demographic groups and city sizes, where male, low-income, and younger users and those located in larger cities more likely to order fast food via O2O platforms.Besides, we also perform a comparative analysis on the food exposure differences in online and offline environments, confirming that the extended service ranges of O2O platforms can create larger "cyber food swamps".Furthermore, regression analysis highlights that a higher ratio of fast food orders is associated with "cyber food swamps", areas characterized by a higher share of accessible fast food restaurants.A 10% increase in this share raises the probability of ordering fast food by 22.0%.Moreover, a quasi-natural experiment substantiates the long-term causal effect of online food environment changes on healthy food choices. 0.506Our findings underscore the need for O2O food delivery platforms to address the health implications of online food choice exposure, thereby informing efforts by various stakeholders to improve residents' dietary health.

link

2024-09-25

Random Forest Regression Feature Importance for Climate Impact Pathway Detection

Disturbances to the climate system, both natural and anthropogenic, have far reaching impacts that are not always easy to identify or quantify using traditional climate science analyses or causal modeling techniques. 0.697In this paper, we develop a novel technique for discovering and ranking the chain of spatio-temporal downstream impacts of a climate source, referred to herein as a source-impact pathway, using Random Forest Regression (RFR) and SHapley Additive exPlanation (SHAP) feature importances.Rather than utilizing RFR for classification or regression tasks (the most common use case for RFR), we propose a fundamentally new RFR-based workflow in which we: (i) train random forest (RF) regressors on a set of spatio-temporal features of interest, (ii) calculate their pair-wise feature importances using the SHAP weights associated with those features, and (iii) translate these feature importances into a weighted pathway network (i.e., a weighted directed graph), which can be used to trace out and rank interdependencies between climate features and/or modalities.We adopt a tiered verification approach to verify our new pathway identification methodology.In this approach, we apply our method to ensembles of data generated by running two increasingly complex benchmarks: (i) a set of synthetic coupled equations, and (ii) a fully coupled simulation of the 1991 eruption of Mount Pinatubo in the Philippines performed using a modified version 2 of the U.S. Department of Energy's Energy Exascale Earth System Model (E3SMv2).We find that our RFR feature importance-based approach can accurately detect known pathways of impact for both test cases.

link

2024-09-25

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

Data selection is of great significance in pre-training large language models, given the variation in quality within the large-scale available training corpora.To achieve this, researchers are currently investigating the use of data influence to measure the importance of data instances, $i.e.,$ a high influence score indicates that incorporating this instance to the training set is likely to enhance the model performance.Consequently, they select the top-$k$ instances with the highest scores.However, this approach has several limitations.(1) Computing the influence of all available data is time-consuming. 0.586(2) The selected data instances are not diverse enough, which may hinder the pre-trained model's ability to generalize effectively to various downstream tasks.In this paper, we introduce \texttt{Quad}, a data selection approach that considers both quality and diversity by using data influence to achieve state-of-the-art pre-training results.In particular, noting that attention layers capture extensive semantic details, we have adapted the accelerated $iHVP$ computation methods for attention layers, enhancing our ability to evaluate the influence of data, $i.e.,$ its quality.For the diversity, \texttt{Quad} clusters the dataset into similar data instances within each cluster and diverse instances across different clusters.For each cluster, if we opt to select data from it, we take some samples to evaluate the influence to prevent processing all instances.To determine which clusters to select, we utilize the classic Multi-Armed Bandit method, treating each cluster as an arm.This approach favors clusters with highly influential instances (ensuring high quality) or clusters that have been selected less frequently (ensuring diversity), thereby well balancing between quality and diversity.

link

2024-09-25

DRIM: Learning Disentangled Representations from Incomplete Multimodal Healthcare Data

Real-life medical data is often multimodal and incomplete, fueling the growing need for advanced deep learning models capable of integrating them efficiently.The use of diverse modalities, including histopathology slides, MRI, and genetic data, offers unprecedented opportunities to improve prognosis prediction and to unveil new treatment pathways.Contrastive learning, widely used for deriving representations from paired data in multimodal tasks, assumes that different views contain the same task-relevant information and leverages only shared information.This assumption becomes restrictive when handling medical data since each modality also harbors specific knowledge relevant to downstream tasks. 0.554We introduce DRIM, a new multimodal method for capturing these shared and unique representations, despite data sparsity.More specifically, given a set of modalities, we aim to encode a representation for each one that can be divided into two components: one encapsulating patient-related information common across modalities and the other, encapsulating modality-specific details.This is achieved by increasing the shared information among different patient modalities while minimizing the overlap between shared and unique components within each modality.Our method outperforms state-of-the-art algorithms on glioma patients survival prediction tasks, while being robust to missing modalities.To promote reproducibility, the code is made publicly available at https://github.com/Lucas-rbnt/DRIM

link

2024-09-25

Efficient Feature Interactions with Transformers: Improving User Spending Propensity Predictions in Gaming

Dream11 is a fantasy sports platform that allows users to create their own virtual teams for real-life sports events.We host multiple sports and matches for our 200M+ user base.In this RMG (real money gaming) setting, users pay an entry amount to participate in various contest products that we provide to users.In our current work, we discuss the problem of predicting the user's propensity to spend in a gaming round, so it can be utilized for various downstream applications.e.g. Upselling users by incentivizing them marginally as per their spending propensity, or personalizing the product listing based on the user's propensity to spend. We aim to model the spending propensity of each user based on past transaction data. 0.528In this paper, we benchmark tree-based and deep-learning models that show good results on structured data, and we propose a new architecture change that is specifically designed to capture the rich interactions among the input features.We show that our proposed architecture outperforms the existing models on the task of predicting the user's propensity to spend in a gaming round.Our new transformer model surpasses the state-of-the-art FT-Transformer, improving MAE by 2.5\% and MSE by 21.8\%.

link

2024-09-25

PACE: marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization

Parameter-Efficient Fine-Tuning (PEFT) effectively adapts pre-trained vision transformers to downstream tasks.However, the optimization for tasks performance often comes at the cost of generalizability in fine-tuned models.To address this issue, we theoretically connect smaller weight gradient norms during training and larger datasets to the improved model generalization.Motivated by this connection, we propose reducing gradient norms for enhanced generalization and aligning fine-tuned model with the pre-trained counterpart to retain knowledge from large-scale pre-training data.Yet, naive alignment does not guarantee gradient reduction and can potentially cause gradient explosion, complicating efforts to manage gradients.To address such issues, we propose PACE, marrying generalization of PArameter-efficient fine-tuning with Consistency rEgularization.We perturb features learned from the adapter with the multiplicative noise and ensure the fine-tuned model remains consistent for same sample under different perturbations.Theoretical analysis shows that PACE not only implicitly regularizes gradients for enhanced generalization, but also implicitly aligns the fine-tuned and pre-trained models to retain knowledge.Experimental evidence supports our theories. 0.541PACE outperforms existing PEFT methods in four visual adaptation tasks: VTAB-1k, FGVC, few-shot learning and domain adaptation.Code will be available at https://github.com/MaxwellYaoNi/PACE

link

2024-09-24

Algorithmic Drift: A Simulation Framework to Study the Effects of Recommender Systems on User Preferences

Digital platforms such as social media and e-commerce websites adopt Recommender Systems to provide value to the user.However, the social consequences deriving from their adoption are still unclear. 0.582Many scholars argue that recommenders may lead to detrimental effects, such as bias-amplification deriving from the feedback loop between algorithmic suggestions and users' choices.Nonetheless, the extent to which recommenders influence changes in users leaning remains uncertain.In this context, it is important to provide a controlled environment for evaluating the recommendation algorithm before deployment.To address this, we propose a stochastic simulation framework that mimics user-recommender system interactions in a long-term scenario.In particular, we simulate the user choices by formalizing a user model, which comprises behavioral aspects, such as the user resistance towards the recommendation algorithm and their inertia in relying on the received suggestions.Additionally, we introduce two novel metrics for quantifying the algorithm's impact on user preferences, specifically in terms of drift over time.We conduct an extensive evaluation on multiple synthetic datasets, aiming at testing the robustness of our framework when considering different scenarios and hyper-parameters setting.The experimental results prove that the proposed methodology is effective in detecting and quantifying the drift over the users preferences by means of the simulation.All the code and data used to perform the experiments are publicly available.

link

2024-09-24

Linear Contextual Bandits with Interference

Interference, a key concept in causal inference, extends the reward modeling process by accounting for the impact of one unit's actions on the rewards of others. 0.783In contextual bandit (CB) settings, where multiple units are present in the same round, potential interference can significantly affect the estimation of expected rewards for different arms, thereby influencing the decision-making process.Although some prior work has explored multi-agent and adversarial bandits in interference-aware settings, the effect of interference in CB, as well as the underlying theory, remains significantly underexplored.In this paper, we introduce a systematic framework to address interference in Linear CB (LinCB), bridging the gap between causal inference and online decision-making. 0.739We propose a series of algorithms that explicitly quantify the interference effect in the reward modeling process and provide comprehensive theoretical guarantees, including sublinear regret bounds, finite sample upper bounds, and asymptotic properties.The effectiveness of our approach is demonstrated through simulations and a synthetic data generated based on MovieLens data.

link

2024-09-24

Training Data Attribution: Was Your Model Secretly Trained On Data Created By Mine?

The emergence of text-to-image models has recently sparked significant interest, but the attendant is a looming shadow of potential infringement by violating the user terms.Specifically, an adversary may exploit data created by a commercial model to train their own without proper authorization.To address such risk, it is crucial to investigate the attribution of a suspicious model's training data by determining whether its training data originates, wholly or partially, from a specific source model. 0.509To trace the generated data, existing methods require applying extra watermarks during either the training or inference phases of the source model.However, these methods are impractical for pre-trained models that have been released, especially when model owners lack security expertise.To tackle this challenge, we propose an injection-free training data attribution method for text-to-image models.It can identify whether a suspicious model's training data stems from a source model, without additional modifications on the source model.The crux of our method lies in the inherent memorization characteristic of text-to-image models.Our core insight is that the memorization of the training dataset is passed down through the data generated by the source model to the model trained on that data, making the source model and the infringing model exhibit consistent behaviors on specific samples.Therefore, our approach involves developing algorithms to uncover these distinct samples and using them as inherent watermarks to verify if a suspicious model originates from the source model.Our experiments demonstrate that our method achieves an accuracy of over 80\% in identifying the source of a suspicious model's training data, without interfering the original training or generation process of the source model.

link

2024-09-24

Facing Asymmetry -- Uncovering the Causal Link between Facial Symmetry and Expression Classifiers using Synthetic Interventions

Understanding expressions is vital for deciphering human behavior, and nowadays, end-to-end trained black box models achieve high performance.Due to the black-box nature of these models, it is unclear how they behave when applied out-of-distribution.Specifically, these models show decreased performance for unilateral facial palsy patients.We hypothesize that one crucial factor guiding the internal decision rules is facial symmetry.In this work, we use insights from causal reasoning to investigate the hypothesis. 0.848After deriving a structural causal model, we develop a synthetic interventional framework. 0.809This approach allows us to analyze how facial symmetry impacts a network's output behavior while keeping other factors fixed.All 17 investigated expression classifiers significantly lower their output activations for reduced symmetry.This result is congruent with observed behavior on real-world data from healthy subjects and facial palsy patients.As such, our investigation serves as a case study for identifying causal factors that influence the behavior of black-box models. 0.722

link

2024-09-18

A novel DFS/BFS approach towards link prediction

Knowledge graphs have been shown to play a significant role in current knowledge mining fields, including life sciences, bioinformatics, computational social sciences, and social network analysis.The problem of link prediction bears many applications and has been extensively studied. 0.553However, most methods are restricted to dimension reduction, probabilistic model, or similarity-based approaches and are inherently biased.In this paper, we provide a definition of graph prediction for link prediction and outline related work to support our novel approach, which integrates centrality measures with classical machine learning methods.We examine our experimental results in detail and identify areas for potential further research.Our method shows promise, particularly when utilizing randomly selected nodes and degree centrality.

link

2024-09-18

Investigating team maturity in an agile automotive reorganization

About seven years ago, Volvo Cars initiated a large-scale agile transformation.Midst this journey, a significant restructuring of the R&D department took place.Our study aims to illuminate how team maturity levels are impacted during such comprehensive reorganizations.We collected data from 63 teams to comprehend the effects of organizational changes on these agile teams.Additionally, qualitative data was gathered to validate our findings and explore underlying reasons. 0.517Contrary to what was expected, the reorganization did not significantly alter the distribution of team maturity.High turnover rates and frequent reorganizations were identified as key factors to why the less mature teams remained in the early stages of team development.Conversely, teams in the second category remained stable at a higher maturity stage, primarily because the teams themselves remained largely intact, with only management structures changing.In conclusion, while reorganizations may hinder some teams' development, others maintain stability at a higher level of maturity despite substantial managerial changes.

link

2024-09-18

Edge-Based Graph Component Pooling

Graph-structured data naturally occurs in many research fields, such as chemistry and sociology. 0.513The relational information contained therein can be leveraged to statistically model graph properties through geometrical deep learning.Graph neural networks employ techniques, such as message-passing layers, to propagate local features through a graph.However, message-passing layers can be computationally expensive when dealing with large and sparse graphs.Graph pooling operators offer the possibility of removing or merging nodes in such graphs, thus lowering computational costs.However, pooling operators that remove nodes cause data loss, and pooling operators that merge nodes are often computationally expensive.We propose a pooling operator that merges nodes so as not to cause data loss but is also conceptually simple and computationally inexpensive.We empirically demonstrate that the proposed pooling operator performs statistically significantly better than edge pool on four popular benchmark datasets while reducing time complexity and the number of trainable parameters by 70.6% on average.Compared to another maximally powerful method named Graph Isomporhic Network, we show that we outperform them on two popular benchmark datasets while reducing the number of learnable parameters on average by 60.9%.

link

2024-09-18

Spectral clustering of time-evolving networks using the inflated dynamic Laplacian for graphs

Complex time-varying networks are prominent models for a wide variety of spatiotemporal phenomena. 0.522The functioning of networks depends crucially on their connectivity, yet reliable techniques for determining communities in spacetime networks remain elusive.We adapt successful spectral techniques from continuous-time dynamics on manifolds to the graph setting to fill this gap.We formulate an {\it inflated dynamic Laplacian} for graphs and develop a spectral theory to underpin the corresponding algorithmic realisations.We develop spectral clustering approaches for both multiplex and non-multiplex networks, based on the eigenvectors of the inflated dynamic Laplacian and specialised Sparse EigenBasis Approximation (SEBA) post-processing of these eigenvectors.We demonstrate that our approach can outperform the Leiden algorithm applied both in spacetime and layer-by-layer, and we analyse voting data from the US senate (where senators come and go as congresses evolve) to quantify increasing polarisation in time.

link

2024-09-17

Contrasformer: A Brain Network Contrastive Transformer for Neurodegenerative Condition Identification

Understanding neurological disorder is a fundamental problem in neuroscience, which often requires the analysis of brain networks derived from functional magnetic resonance imaging (fMRI) data.Despite the prevalence of Graph Neural Networks (GNNs) and Graph Transformers in various domains, applying them to brain networks faces challenges.Specifically, the datasets are severely impacted by the noises caused by distribution shifts across sub-populations and the neglect of node identities, both obstruct the identification of disease-specific patterns. 0.504To tackle these challenges, we propose Contrasformer, a novel contrastive brain network Transformer.It generates a prior-knowledge-enhanced contrast graph to address the distribution shifts across sub-populations by a two-stream attention mechanism.A cross attention with identity embedding highlights the identity of nodes, and three auxiliary losses ensure group consistency.Evaluated on 4 functional brain network datasets over 4 different diseases, Contrasformer outperforms the state-of-the-art methods for brain networks by achieving up to 10.8\% improvement in accuracy, which demonstrates its efficacy in neurological disorder identification.Case studies illustrate its interpretability, especially in the context of neuroscience.This paper provides a solution for analyzing brain networks, offering valuable insights into neurological disorders.Our code is available at \url{https://github.com/AngusMonroe/Contrasformer}.

link

2024-09-17

Latent mixed-effect models for high-dimensional longitudinal data

Modelling longitudinal data is an important yet challenging task.These datasets can be high-dimensional, contain non-linear effects and time-varying covariates.Gaussian process (GP) prior-based variational autoencoders (VAEs) have emerged as a promising approach due to their ability to model time-series data.However, they are costly to train and struggle to fully exploit the rich covariates characteristic of longitudinal data, making them difficult for practitioners to use effectively. 0.513In this work, we leverage linear mixed models (LMMs) and amortized variational inference to provide conditional priors for VAEs, and propose LMM-VAE, a scalable, interpretable and identifiable model.We highlight theoretical connections between it and GP-based techniques, providing a unified framework for this class of methods.Our proposal performs competitively compared to existing approaches across simulated and real-world datasets.

link

2024-09-17

Unveiling the Social Fabric: A Temporal, Nation-Scale Social Network and its Characteristics

Social networks shape individuals' lives, influencing everything from career paths to health.This paper presents a registry-based, multi-layer and temporal network of the entire Danish population in the years 2008-2021 (roughly 7.2 mill.individuals).Our network maps the relationships formed through family, households, neighborhoods, colleagues and classmates. 0.552We outline key properties of this multiplex network, introducing both an individual-focused perspective as well as a bipartite representation.We show how to aggregate and combine the layers, and how to efficiently compute network measures such as shortest paths in large administrative networks.Our analysis reveals how past connections reappear later in other layers, that the number of relationships aggregated over time reflects the position in the income distribution, and that we can recover canonical shortest path length distributions when appropriately weighting connections. 0.542Along with the network data, we release a Python package that uses the bipartite network representation for efficient analysis.

link

2024-09-17

Navigating Process Mining: A Case study using pm4py

Process-mining techniques have emerged as powerful tools for analyzing event data to gain insights into business processes. 0.515In this paper, we present a comprehensive analysis of road traffic fine management processes using the pm4py library in Python.We start by importing an event log dataset and explore its characteristics, including the distribution of activities and process variants.Through filtering and statistical analysis, we uncover key patterns and variations in the process executions.Subsequently, we apply various process-mining algorithms, including the Alpha Miner, Inductive Miner, and Heuristic Miner, to discover process models from the event log data.We visualize the discovered models to understand the workflow structures and dependencies within the process.Additionally, we discuss the strengths and limitations of each mining approach in capturing the underlying process dynamics.Our findings shed light on the efficiency and effectiveness of road traffic fine management processes, providing valuable insights for process optimization and decision-making.This study demonstrates the utility of pm4py in facilitating process mining tasks and its potential for analyzing real-world business processes.

link

2024-09-17

Unlocking NACE Classification Embeddings with OpenAI for Enhanced Analysis and Processing

The Statistical Classification of Economic Activities in the European Community (NACE) is the standard classification system for the categorization of economic and industrial activities within the European Union.This paper proposes a novel approach to transform the NACE classification into low-dimensional embeddings, using state-of-the-art models and dimensionality reduction techniques.The primary challenge is the preservation of the hierarchical structure inherent within the original NACE classification while reducing the number of dimensions.To address this issue, we introduce custom metrics designed to quantify the retention of hierarchical relationships throughout the embedding and reduction processes.The evaluation of these metrics demonstrates the effectiveness of the proposed methodology in retaining the structural information essential for insightful analysis.This approach not only facilitates the visual exploration of economic activity relationships, but also increases the efficacy of downstream tasks, including clustering, classification, integration with other classifications, and others. 0.506Through experimental validation, the utility of our proposed framework in preserving hierarchical structures within the NACE classification is showcased, thereby providing a valuable tool for researchers and policymakers to understand and leverage any hierarchical data.

link

2024-09-17

CountChain: A Decentralized Oracle Network for Counting Systems

Blockchain integration in industries like online advertising is hindered by its connectivity limitations to off-chain data.These industries heavily rely on precise counting systems for collecting and analyzing off-chain data.This requires mechanisms, often called oracles, to feed off-chain data into smart contracts.However, current oracle solutions are ill-suited for counting systems since the oracles do not know when to expect the data, posing a significant challenge. To address this, we present CountChain, a decentralized oracle network for counting systems.In CountChain, data is received by all oracle nodes, and any node can submit a proposition request.Each proposition contains enough data to evaluate the occurrence of an event. 0.546Only randomly selected nodes participate in a game to evaluate the truthfulness of each proposition by providing proof and some stake.Finally, the propositions with the outcome of True increment the counter in a smart contract.Thus, instead of a contract calling oracles for data, in CountChain, the oracles call a smart contract when the data is available.Furthermore, we present a formal analysis and experimental evaluation of the system's parameters on over half a million data points to obtain optimal system parameters.In such conditions, our game-theoretical analysis demonstrates that a Nash equilibrium exists wherein all rational parties participate with honesty.

link

2024-09-16

Causal Discovery in Recommender Systems: Example and Discussion

Causality is receiving increasing attention by the artificial intelligence and machine learning communities. 0.843This paper gives an example of modelling a recommender system problem using causal graphs. 0.779Specifically, we approached the causal discovery task to learn a causal graph by combining observational data from an open-source dataset with prior knowledge. 0.831The resulting causal graph shows that only a few variables effectively influence the analysed feedback signals. 0.823This contrasts with the recent trend in the machine learning community to include more and more variables in massive models, such as neural networks.

link

2024-09-16

Robust image representations with counterfactual contrastive learning

Contrastive pretraining can substantially increase model generalisation and downstream performance.However, the quality of the learned representations is highly dependent on the data augmentation strategy applied to generate positive pairs.Positive contrastive pairs should preserve semantic meaning while discarding unwanted variations related to the data acquisition domain.Traditional contrastive pipelines attempt to simulate domain shifts through pre-defined generic image transformations.However, these do not always mimic realistic and relevant domain variations for medical imaging such as scanner differences.To tackle this issue, we herein introduce counterfactual contrastive learning, a novel framework leveraging recent advances in causal image synthesis to create contrastive positive pairs that faithfully capture relevant domain variations. 0.586Our method, evaluated across five datasets encompassing both chest radiography and mammography data, for two established contrastive objectives (SimCLR and DINO-v2), outperforms standard contrastive learning in terms of robustness to acquisition shift.Notably, counterfactual contrastive learning achieves superior downstream performance on both in-distribution and on external datasets, especially for images acquired with scanners under-represented in the training set.Further experiments show that the proposed framework extends beyond acquisition shifts, with models trained with counterfactual contrastive learning substantially improving subgroup performance across biological sex.

link

2024-09-16

TPFL: Tsetlin-Personalized Federated Learning with Confidence-Based Clustering

The world of Machine Learning (ML) has witnessed rapid changes in terms of new models and ways to process users data.The majority of work that has been done is focused on Deep Learning (DL) based approaches.However, with the emergence of new algorithms such as the Tsetlin Machine (TM) algorithm, there is growing interest in exploring alternative approaches that may offer unique advantages in certain domains or applications.One of these domains is Federated Learning (FL), in which users privacy is of utmost importance.Due to its novelty, FL has seen a surge in the incorporation of personalization techniques to enhance model accuracy while maintaining user privacy under personalized conditions.In this work, we propose a novel approach dubbed TPFL: Tsetlin-Personalized Federated Learning, in which models are grouped into clusters based on their confidence towards a specific class.In this way, clustering can benefit from two key advantages.Firstly, clients share only what they are confident about, resulting in the elimination of wrongful weight aggregation among clients whose data for a specific class may have not been enough during the training.This phenomenon is prevalent when the data are non-Independent and Identically Distributed (non-IID). 0.532Secondly, by sharing only weights towards a specific class, communication cost is substantially reduced, making TPLF efficient in terms of both accuracy and communication cost.The results of TPFL demonstrated the highest accuracy on three different datasets; namely MNIST, FashionMNIST and FEMNIST.

link

2024-09-16

"The Data Says Otherwise"-Towards Automated Fact-checking and Communication of Data Claims

Fact-checking data claims requires data evidence retrieval and analysis, which can become tedious and intractable when done manually.This work presents Aletheia, an automated fact-checking prototype designed to facilitate data claims verification and enhance data evidence communication.For verification, we utilize a pre-trained LLM to parse the semantics for evidence retrieval.To effectively communicate the data evidence, we design representations in two forms: data tables and visualizations, tailored to various data fact types. 0.504Additionally, we design interactions that showcase a real-world application of these techniques.We evaluate the performance of two core NLP tasks with a curated dataset comprising 400 data claims and compare the two representation forms regarding viewers' assessment time, confidence, and preference via a user study with 20 participants.The evaluation offers insights into the feasibility and bottlenecks of using LLMs for data fact-checking tasks, potential advantages and disadvantages of using visualizations over data tables, and design recommendations for presenting data evidence.

link

2024-09-16

Impact Of Emotions on Information Seeking And Sharing Behaviors During Pandemic

We propose a novel approach to assess the public's coping behavior during the COVID-19 outbreak by examining the emotions. 0.529Specifically, we explore (1) changes in the public's emotions with the COVID-19 crisis progression and (2) the impacts of the public's emotions on their information-seeking, information-sharing behaviors, and compliance with stay-at-home policies.We base the study on the appraisal tendency framework, detect the public's emotions by fine-tuning a pre-trained RoBERTa model, and cross-analyze third-party behavioral data.We demonstrate the feasibility and reliability of our proposed approach in providing a large-scale examination of the publi's emotions and coping behaviors in a real-world crisis: COVID-19.The approach complements prior crisis communication research, mainly based on self-reported, small-scale experiments and survey data.Our results show that anger and fear are more prominent than other emotions experienced by the public at the pandemic's outbreak stage.Results also show that the extent of low certainty and passive emotions (e.g., sadness, fear) was related to increased information-seeking and information-sharing behaviors.Additionally, high-certainty (e.g., anger) and low-certainty (e.g., sadness, fear) emotions during the outbreak correlated to the public's compliance with stay-at-home orders. 0.508

link

Explainability Research

2024-09-26

Learning Occlusion-aware Decision-making from Agent Interaction via Active Perception

Occlusion-aware decision-making is essential in autonomous driving due to the high uncertainty of various occlusions.Recent occlusion-aware decision-making methods encounter issues such as high computational complexity, scenario scalability challenges, or reliance on limited expert data.Benefiting from automatically generating data by exploration randomization, we uncover that reinforcement learning (RL) may show promise in occlusion-aware decision-making.However, previous occlusion-aware RL faces challenges in expanding to various dynamic and static occlusion scenarios, low learning efficiency, and lack of predictive ability.To address these issues, we introduce Pad-AI, a self-reinforcing framework to learn occlusion-aware decision-making through active perception.Pad-AI utilizes vectorized representation to represent occluded environments efficiently and learns over the semantic motion primitives to focus on high-level active perception exploration.Furthermore, Pad-AI integrates prediction and RL within a unified framework to provide risk-aware learning and security guarantees. 0.545Our framework was tested in challenging scenarios under both dynamic and static occlusions and demonstrated efficient and general perception-aware exploration performance to other strong baselines in closed-loop evaluations.

link

2024-09-26

DarkSAM: Fooling Segment Anything Model to Segment Nothing

Segment Anything Model (SAM) has recently gained much attention for its outstanding generalization to unseen data and tasks.Despite its promising prospect, the vulnerabilities of SAM, especially to universal adversarial perturbation (UAP) have not been thoroughly investigated yet. 0.515In this paper, we propose DarkSAM, the first prompt-free universal attack framework against SAM, including a semantic decoupling-based spatial attack and a texture distortion-based frequency attack.We first divide the output of SAM into foreground and background.Then, we design a shadow target strategy to obtain the semantic blueprint of the image as the attack target.DarkSAM is dedicated to fooling SAM by extracting and destroying crucial object features from images in both spatial and frequency domains.In the spatial domain, we disrupt the semantics of both the foreground and background in the image to confuse SAM.In the frequency domain, we further enhance the attack effectiveness by distorting the high-frequency components (i.e., texture information) of the image.Consequently, with a single UAP, DarkSAM renders SAM incapable of segmenting objects across diverse images with varying prompts.Experimental results on four datasets for SAM and its two variant models demonstrate the powerful attack capability and transferability of DarkSAM.

link

2024-09-26

Explaining Explaining

Explanation is key to people having confidence in high-stakes AI systems. 0.623However, machine-learning-based systems - which account for almost all current AI - can't explain because they are usually black boxes. 0.597The explainable AI (XAI) movement hedges this problem by redefining "explanation". 0.539The human-centered explainable AI (HCXAI) movement identifies the explanation-oriented needs of users but can't fulfill them because of its commitment to machine learning. 0.524In order to achieve the kinds of explanations needed by real people operating in critical domains, we must rethink how to approach AI. 0.606We describe a hybrid approach to developing cognitive agents that uses a knowledge-based infrastructure supplemented by data obtained through machine learning when applicable.These agents will serve as assistants to humans who will bear ultimate responsibility for the decisions and actions of the human-robot team.We illustrate the explanatory potential of such agents using the under-the-hood panels of a demonstration system in which a team of simulated robots collaborates on a search task assigned by a human.

link

2024-09-26

AI-Powered Augmented Reality for Satellite Assembly, Integration and Test

The integration of Artificial Intelligence (AI) and Augmented Reality (AR) is set to transform satellite Assembly, Integration, and Testing (AIT) processes by enhancing precision, minimizing human error, and improving operational efficiency in cleanroom environments.This paper presents a technical description of the European Space Agency's (ESA) project "AI for AR in Satellite AIT," which combines real-time computer vision and AR systems to assist technicians during satellite assembly.Leveraging Microsoft HoloLens 2 as the AR interface, the system delivers context-aware instructions and real-time feedback, tackling the complexities of object recognition and 6D pose estimation in AIT workflows.All AI models demonstrated over 70% accuracy, with the detection model exceeding 95% accuracy, indicating a high level of performance and reliability. 0.538A key contribution of this work lies in the effective use of synthetic data for training AI models in AR applications, addressing the significant challenges of obtaining real-world datasets in highly dynamic satellite environments, as well as the creation of the Segmented Anything Model for Automatic Labelling (SAMAL), which facilitates the automatic annotation of real data, achieving speeds up to 20 times faster than manual human annotation.The findings demonstrate the efficacy of AI-driven AR systems in automating critical satellite assembly tasks, setting a foundation for future innovations in the space industry.

link

2024-09-25

Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics

Explainable AI (XAI) is a rapidly growing domain with a myriad of proposed methods as well as metrics aiming to evaluate their efficacy. 0.603However, current studies are often of limited scope, examining only a handful of XAI methods and ignoring underlying design parameters for performance, such as the model architecture or the nature of input data.Moreover, they often rely on one or a few metrics and neglect thorough validation, increasing the risk of selection bias and ignoring discrepancies among metrics.These shortcomings leave practitioners confused about which method to choose for their problem.In response, we introduce LATEC, a large-scale benchmark that critically evaluates 17 prominent XAI methods using 20 distinct metrics.We systematically incorporate vital design parameters like varied architectures and diverse input modalities, resulting in 7,560 examined combinations.Through LATEC, we showcase the high risk of conflicting metrics leading to unreliable rankings and consequently propose a more robust evaluation scheme.Further, we comprehensively evaluate various XAI methods to assist practitioners in selecting appropriate methods aligning with their needs.Curiously, the emerging top-performing method, Expected Gradients, is not examined in any relevant related study.LATEC reinforces its role in future XAI research by publicly releasing all 326k saliency maps and 378k metric scores as a (meta-)evaluation dataset.

link

2024-09-25

Enhancing Feature Selection and Interpretability in AI Regression Tasks Through Feature Attribution

Research in Explainable Artificial Intelligence (XAI) is increasing, aiming to make deep learning models more transparent. 0.688Most XAI methods focus on justifying the decisions made by Artificial Intelligence (AI) systems in security-relevant applications.However, relatively little attention has been given to using these methods to improve the performance and robustness of deep learning algorithms.Additionally, much of the existing XAI work primarily addresses classification problems.In this study, we investigate the potential of feature attribution methods to filter out uninformative features in input data for regression problems, thereby improving the accuracy and stability of predictions.We introduce a feature selection pipeline that combines Integrated Gradients with k-means clustering to select an optimal set of variables from the initial data space.To validate the effectiveness of this approach, we apply it to a real-world industrial problem - blade vibration analysis in the development process of turbo machinery.

link

2024-09-25

Towards User-Focused Research in Training Data Attribution for Human-Centered Explainable AI

While Explainable AI (XAI) aims to make AI understandable and useful to humans, it has been criticised for relying too much on formalism and solutionism, focusing more on mathematical soundness than user needs. 0.628We propose an alternative to this bottom-up approach inspired by design thinking: the XAI research community should adopt a top-down, user-focused perspective to ensure user relevance.We illustrate this with a relatively young subfield of XAI, Training Data Attribution (TDA).With the surge in TDA research and growing competition, the field risks repeating the same patterns of solutionism.We conducted a needfinding study with a diverse group of AI practitioners to identify potential user needs related to TDA.Through interviews (N=10) and a systematic survey (N=31), we uncovered new TDA tasks that are currently largely overlooked.We invite the TDA and XAI communities to consider these novel tasks and improve the user relevance of their research outcomes.

link

2024-09-25

CombU: A Combined Unit Activation for Fitting Mathematical Expressions with Neural Networks

The activation functions are fundamental to neural networks as they introduce non-linearity into data relationships, thereby enabling deep networks to approximate complex data relations.Existing efforts to enhance neural network performance have predominantly focused on developing new mathematical functions. 0.503However, we find that a well-designed combination of existing activation functions within a neural network can also achieve this objective.In this paper, we introduce the Combined Units activation (CombU), which employs different activation functions at various dimensions across different layers.This approach can be theoretically proven to fit most mathematical expressions accurately.The experiments conducted on four mathematical expression datasets, compared against six State-Of-The-Art (SOTA) activation function algorithms, demonstrate that CombU outperforms all SOTA algorithms in 10 out of 16 metrics and ranks in the top three for the remaining six metrics.

link

2024-09-25

Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Handy Appetizer

This book explores the role of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) in driving the progress of big data analytics and management.The book focuses on simplifying the complex mathematical concepts behind deep learning, offering intuitive visualizations and practical case studies to help readers understand how neural networks and technologies like Convolutional Neural Networks (CNNs) work. 0.544It introduces several classic models and technologies such as Transformers, GPT, ResNet, BERT, and YOLO, highlighting their applications in fields like natural language processing, image recognition, and autonomous driving.The book also emphasizes the importance of pre-trained models and how they can enhance model performance and accuracy, with instructions on how to apply these models in various real-world scenarios.Additionally, it provides an overview of key big data management technologies like SQL and NoSQL databases, as well as distributed computing frameworks such as Apache Hadoop and Spark, explaining their importance in managing and processing vast amounts of data.Ultimately, the book underscores the value of mastering deep learning and big data management skills as critical tools for the future workforce, making it an essential resource for both beginners and experienced professionals.

link

2024-09-24

GraphGI:A GNN Explanation Method using Game Interaction

Graph Neural Networks (GNNs) have garnered significant attention and have been extensively utilized across various domains.However, similar to other deep learning models, GNNs are often viewed as black-box models, making it challenging to interpret their prediction mechanisms. 0.585Current graph explanation techniques focus on identifying key nodes or edges, attributing the critical data features that drive model predictions.Nevertheless, these features do not independently influence the model's outcomes; rather, they interact with one another to collectively affect predictions.In this work, we propose a novel explanatory method GraphGI, which identifies the coalition with the highest interaction strength and presents it as an explanatory subgraph.Given a trained model and an input graph, our method explains predictions by gradually incorporating significant edges into the selected subgraph.We utilize game-theoretic interaction values to assess the interaction strength after edge additions, ensuring that the newly added edges confer maximum interaction strength to the explanatory subgraph.To enhance computational efficiency, we adopt effective approximation techniques for calculating Shapley values and game-theoretic interaction values.Empirical evaluations demonstrate that our method achieves superior fidelity and sparsity, maintaining the interpretability of the results at a comprehensible level.

link

2024-09-24

Machine learning approaches for automatic defect detection in photovoltaic systems

Solar photovoltaic (PV) modules are prone to damage during manufacturing, installation and operation which reduces their power conversion efficiency.This diminishes their positive environmental impact over the lifecycle.Continuous monitoring of PV modules during operation via unmanned aerial vehicles is essential to ensure that defective panels are promptly replaced or repaired to maintain high power conversion efficiencies.Computer vision provides an automatic, non-destructive and cost-effective tool for monitoring defects in large-scale PV plants.We review the current landscape of deep learning-based computer vision techniques used for detecting defects in solar modules.We compare and evaluate the existing approaches at different levels, namely the type of images used, data collection and processing method, deep learning architectures employed, and model interpretability.Most approaches use convolutional neural networks together with data augmentation or generative adversarial network-based techniques.We evaluate the deep learning approaches by performing interpretability analysis on classification tasks. 0.518This analysis reveals that the model focuses on the darker regions of the image to perform the classification.We find clear gaps in the existing approaches while also laying out the groundwork for mitigating these challenges when building new models.We conclude with the relevant research gaps that need to be addressed and approaches for progress in this field: integrating geometric deep learning with existing approaches for building more robust and reliable models, leveraging physics-based neural networks that combine domain expertise of physical laws to build more domain-aware deep learning models, and incorporating interpretability as a factor for building models that can be trusted. 0.615The review points towards a clear roadmap for making this technology commercially relevant.

link

2024-09-18

OSINT Clinic: Co-designing AI-Augmented Collaborative OSINT Investigations for Vulnerability Assessment

Small businesses need vulnerability assessments to identify and mitigate cyber risks.Cybersecurity clinics provide a solution by offering students hands-on experience while delivering free vulnerability assessments to local organizations.To scale this model, we propose an Open Source Intelligence (OSINT) clinic where students conduct assessments using only publicly available data.We enhance the quality of investigations in the OSINT clinic by addressing the technical and collaborative challenges.Over the duration of the 2023-24 academic year, we conducted a three-phase co-design study with six students.Our study identified key challenges in the OSINT investigations and explored how generative AI could address these performance gaps.We developed design ideas for effective AI integration based on the use of AI probes and collaboration platform features. 0.524A pilot with three small businesses highlighted both the practical benefits of AI in streamlining investigations, and limitations, including privacy concerns and difficulty in monitoring progress.

link

2024-09-18

An efficient wavelet-based physics-informed neural networks for singularly perturbed problems

Physics-informed neural networks (PINNs) are a class of deep learning models that utilize physics as differential equations to address complex problems, including ones that may involve limited data availability. 0.665However, tackling solutions of differential equations with oscillations or singular perturbations and shock-like structures becomes challenging for PINNs.Considering these challenges, we designed an efficient wavelet-based PINNs (W-PINNs) model to solve singularly perturbed differential equations.Here, we represent the solution in wavelet space using a family of smooth-compactly supported wavelets.This framework represents the solution of a differential equation with significantly fewer degrees of freedom while still retaining in capturing, identifying, and analyzing the local structure of complex physical phenomena.The architecture allows the training process to search for a solution within wavelet space, making the process faster and more accurate.The proposed model does not rely on automatic differentiations for derivatives involved in differential equations and does not require any prior information regarding the behavior of the solution, such as the location of abrupt features.Thus, through a strategic fusion of wavelets with PINNs, W-PINNs excel at capturing localized nonlinear information, making them well-suited for problems showing abrupt behavior in certain regions, such as singularly perturbed problems.The efficiency and accuracy of the proposed neural network model are demonstrated in various test problems, i.e., highly singularly perturbed nonlinear differential equations, the FitzHugh-Nagumo (FHN), and Predator-prey interaction models. 0.509The proposed design model exhibits impressive comparisons with traditional PINNs and the recently developed wavelet-based PINNs, which use wavelets as an activation function for solving nonlinear differential equations.

link

2024-09-18

PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning

Backdoor attacks pose a significant threat to deep neural networks, particularly as recent advancements have led to increasingly subtle implantation, making the defense more challenging. 0.537Existing defense mechanisms typically rely on an additional clean dataset as a standard reference and involve retraining an auxiliary model or fine-tuning the entire victim model.However, these approaches are often computationally expensive and not always feasible in practical applications.In this paper, we propose a novel and lightweight defense mechanism, termed PAD-FT, that does not require an additional clean dataset and fine-tunes only a very small part of the model to disinfect the victim model.To achieve this, our approach first introduces a simple data purification process to identify and select the most-likely clean data from the poisoned training dataset.The self-purified clean dataset is then used for activation clipping and fine-tuning only the last classification layer of the victim model.By integrating data purification, activation clipping, and classifier fine-tuning, our mechanism PAD-FT demonstrates superior effectiveness across multiple backdoor attack methods and datasets, as confirmed through extensive experimental evaluation.

link

2024-09-18

A Controlled Study on Long Context Extension and Generalization in LLMs

Broad textual understanding and in-context learning require language models that utilize full document contexts.Due to the implementation challenges associated with directly training long-context models, many methods have been proposed for extending models to handle long contexts.However, owing to differences in data and model classes, it has been challenging to compare these approaches, leading to uncertainty as to how to evaluate long-context performance and whether it differs from standard evaluation.We implement a controlled protocol for extension methods with a standardized evaluation, utilizing consistent base models and extension data.Our study yields several insights into long-context behavior.First, we reaffirm the critical role of perplexity as a general-purpose performance indicator even in longer-context tasks.Second, we find that current approximate attention methods systematically underperform across long-context tasks.Finally, we confirm that exact fine-tuning based methods are generally effective within the range of their extension, whereas extrapolation remains challenging.All codebases, models, and checkpoints will be made available open-source, promoting transparency and facilitating further research in this critical area of AI development. 0.632

link

2024-09-17

A Physics Informed Neural Network (PINN) Methodology for Coupled Moving Boundary PDEs

Physics-Informed Neural Network (PINN) is a novel multi-task learning framework useful for solving physical problems modeled using differential equations (DEs) by integrating the knowledge of physics and known constraints into the components of deep learning. 0.659A large class of physical problems in materials science and mechanics involve moving boundaries, where interface flux balance conditions are to be satisfied while solving DEs.Examples of such systems include free surface flows, shock propagation, solidification of pure and alloy systems etc.While recent research works have explored applicability of PINNs for an uncoupled system (such as solidification of pure system), the present work reports a PINN-based approach to solve coupled systems involving multiple governing parameters (energy and species, along with multiple interface balance equations).This methodology employs an architecture consisting of a separate network for each variable with a separate treatment of each phase, a training strategy which alternates between temporal learning and adaptive loss weighting, and a scheme which progressively reduces the optimisation space.While solving the benchmark problem of binary alloy solidification, it is distinctly successful at capturing the complex composition profile, which has a characteristic discontinuity at the interface and the resulting predictions align well with the analytical solutions.The procedure can be generalised for solving other transient multiphysics problems especially in the low-data regime and in cases where measurements can reveal new physics.

link

2024-09-17

Leveraging Reviewer Experience in Code Review Comment Generation

Modern code review is a ubiquitous software quality assurance process aimed at identifying potential issues within newly written code.Despite its effectiveness, the process demands large amounts of effort from the human reviewers involved.To help alleviate this workload, researchers have trained deep learning models to imitate human reviewers in providing natural language code reviews.Formally, this task is known as code review comment generation.Prior work has demonstrated improvements in this task by leveraging machine learning techniques and neural models, such as transfer learning and the transformer architecture.However, the quality of the model generated reviews remain sub-optimal due to the quality of the open-source code review data used in model training.This is in part due to the data obtained from open-source projects where code reviews are conducted in a public forum, and reviewers possess varying levels of software development experience, potentially affecting the quality of their feedback.To accommodate for this variation, we propose a suite of experience-aware training methods that utilise the reviewers' past authoring and reviewing experiences as signals for review quality.Specifically, we propose experience-aware loss functions (ELF), which use the reviewers' authoring and reviewing ownership of a project as weights in the model's loss function.Through this method, experienced reviewers' code reviews yield larger influence over the model's behaviour.Compared to the SOTA model, ELF was able to generate higher quality reviews in terms of accuracy, informativeness, and comment types generated.The key contribution of this work is the demonstration of how traditional software engineering concepts such as reviewer experience can be integrated into the design of AI-based automated code review models. 0.51

link

2024-09-17

A logical alarm for misaligned binary classifiers

If two agents disagree in their decisions, we may suspect they are not both correct.This intuition is formalized for evaluating agents that have carried out a binary classification task.Their agreements and disagreements on a joint test allow us to establish the only group evaluations logically consistent with their responses.This is done by establishing a set of axioms (algebraic relations) that must be universally obeyed by all evaluations of binary responders.A complete set of such axioms are possible for each ensemble of size N.The axioms for $N = 1, 2$ are used to construct a fully logical alarm - one that can prove that at least one ensemble member is malfunctioning using only unlabeled data.The similarities of this approach to formal software verification and its utility for recent agendas of safe guaranteed AI are discussed. 0.543

link

2024-09-17

Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations

This paper presents an in-depth analysis of the scale generalisation properties of the scale-covariant and scale-invariant Gaussian derivative networks, complemented with both conceptual and algorithmic extensions.For this purpose, Gaussian derivative networks are evaluated on new rescaled versions of the Fashion-MNIST and the CIFAR-10 datasets, with spatial scaling variations over a factor of 4 in the testing data, that are not present in the training data.Additionally, evaluations on the previously existing STIR datasets show that the Gaussian derivative networks achieve better scale generalisation than previously reported for these datasets for other types of deep networks. We first experimentally demonstrate that the Gaussian derivative networks have quite good scale generalisation properties on the new datasets, and that average pooling of feature responses over scales may sometimes also lead to better results than the previously used approach of max pooling over scales.Then, we demonstrate that using a spatial max pooling mechanism after the final layer enables localisation of non-centred objects in image domain, with maintained scale generalisation properties.We also show that regularisation during training, by applying dropout across the scale channels, referred to as scale-channel dropout, improves both the performance and the scale generalisation. In additional ablation studies, we demonstrate that discretisations of Gaussian derivative networks, based on the discrete analogue of the Gaussian kernel in combination with central difference operators, perform best or among the best, compared to a set of other discrete approximations of the Gaussian derivative kernels. Finally, by visualising the activation maps and the learned receptive fields, we demonstrate that the Gaussian derivative networks have very good explainability properties. 0.511

link

2024-09-17

Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models

The machine learning community is increasingly recognizing the importance of fostering trust and safety in modern generative AI (GenAI) models. 0.582We posit machine unlearning (MU) as a crucial foundation for developing safe, secure, and trustworthy GenAI models.Traditional MU methods often rely on stringent assumptions and require access to real data.This paper introduces ScoreForgetting Distillation (SFD), an innovative MU approach that promotes the forgetting of undesirable information in diffusion models by aligning the conditional scores of ``unsafe'' classes or concepts with those of ``safe'' ones.To eliminate the need for real data, our SFD framework incorporates a score-based MU loss into the score distillation objective of a pretrained diffusion model.This serves as a regularization term that preserves desired generation capabilities while enabling the production of synthetic data through a one-step generator.Our experiments on pretrained label-conditional and text-to-image diffusion models demonstrate that our method effectively accelerates the forgetting of target classes or concepts during generation, while preserving the quality of other classes or concepts.This unlearned and distilled diffusion not only pioneers a novel concept in MU but also accelerates the generation speed of diffusion models.Our experiments and studies on a range of diffusion models and datasets confirm that our approach is generalizable, effective, and advantageous for MU in diffusion models.

link

2024-09-17

Multi-OCT-SelfNet: Integrating Self-Supervised Learning with Multi-Source Data Fusion for Enhanced Multi-Class Retinal Disease Classification

In the medical domain, acquiring large datasets poses significant challenges due to privacy concerns.Nonetheless, the development of a robust deep-learning model for retinal disease diagnosis necessitates a substantial dataset for training.The capacity to generalize effectively on smaller datasets remains a persistent challenge.The scarcity of data presents a significant barrier to the practical implementation of scalable medical AI solutions. 0.507To address this issue, we've combined a wide range of data sources to improve performance and generalization to new data by giving it a deeper understanding of the data representation from multi-modal datasets and developed a self-supervised framework based on large language models (LLMs), SwinV2 to gain a deeper understanding of multi-modal dataset representations, enhancing the model's ability to extrapolate to new data for the detection of eye diseases using optical coherence tomography (OCT) images.We adopt a two-phase training methodology, self-supervised pre-training, and fine-tuning on a downstream supervised classifier.An ablation study conducted across three datasets employing various encoder backbones, without data fusion, with low data availability setting, and without self-supervised pre-training scenarios, highlights the robustness of our method.Our findings demonstrate consistent performance across these diverse conditions, showcasing superior generalization capabilities compared to the baseline model, ResNet-50.

link

2024-09-12

Taylor-Sensus Network: Embracing Noise to Enlighten Uncertainty for Scientific Data

Uncertainty estimation is crucial in scientific data for machine learning.Current uncertainty estimation methods mainly focus on the model's inherent uncertainty, while neglecting the explicit modeling of noise in the data.Furthermore, noise estimation methods typically rely on temporal or spatial dependencies, which can pose a significant challenge in structured scientific data where such dependencies among samples are often absent.To address these challenges in scientific research, we propose the Taylor-Sensus Network (TSNet).TSNet innovatively uses a Taylor series expansion to model complex, heteroscedastic noise and proposes a deep Taylor block for aware noise distribution.TSNet includes a noise-aware contrastive learning module and a data density perception module for aleatoric and epistemic uncertainty.Additionally, an uncertainty combination operator is used to integrate these uncertainties, and the network is trained using a novel heteroscedastic mean square error loss.TSNet demonstrates superior performance over mainstream and state-of-the-art methods in experiments, highlighting its potential in scientific research and noise resistance.It will be open-source to facilitate the community of "AI for Science". 0.523

link

2024-09-12

Quantifying Aleatoric and Epistemic Dynamics Uncertainty via Local Conformal Calibration

Whether learned, simulated, or analytical, approximations of a robot's dynamics can be inaccurate when encountering novel environments. 0.512Many approaches have been proposed to quantify the aleatoric uncertainty of such methods, i.e. uncertainty resulting from stochasticity, however these estimates alone are not enough to properly estimate the uncertainty of a model in a novel environment, where the actual dynamics can change.Such changes can induce epistemic uncertainty, i.e. uncertainty due to a lack of information/data.Accounting for both epistemic and aleatoric dynamics uncertainty in a theoretically-grounded way remains an open problem.We introduce Local Uncertainty Conformal Calibration (LUCCa), a conformal prediction-based approach that calibrates the aleatoric uncertainty estimates provided by dynamics models to generate probabilistically-valid prediction regions of the system's state.We account for both epistemic and aleatoric uncertainty non-asymptotically, without strong assumptions about the form of the true dynamics or how it changes.The calibration is performed locally in the state-action space, leading to uncertainty estimates that are useful for planning.We validate our method by constructing probabilistically-safe plans for a double-integrator under significant changes in dynamics.

link

2024-09-11

A Cost-Aware Approach to Adversarial Robustness in Neural Networks

Considering the growing prominence of production-level AI and the threat of adversarial attacks that can evade a model at run-time, evaluating the robustness of models to these evasion attacks is of critical importance. 0.503Additionally, testing model changes likely means deploying the models to (e.g. a car or a medical imaging device), or a drone to see how it affects performance, making un-tested changes a public problem that reduces development speed, increases cost of development, and makes it difficult (if not impossible) to parse cause from effect.In this work, we used survival analysis as a cloud-native, time-efficient and precise method for predicting model performance in the presence of adversarial noise.For neural networks in particular, the relationships between the learning rate, batch size, training time, convergence time, and deployment cost are highly complex, so researchers generally rely on benchmark datasets to assess the ability of a model to generalize beyond the training data.To address this, we propose using accelerated failure time models to measure the effect of hardware choice, batch size, number of epochs, and test-set accuracy by using adversarial attacks to induce failures on a reference model architecture before deploying the model to the real world.We evaluate several GPU types and use the Tree Parzen Estimator to maximize model robustness and minimize model run-time simultaneously.This provides a way to evaluate the model and optimise it in a single step, while simultaneously allowing us to model the effect of model parameters on training time, prediction time, and accuracy.Using this technique, we demonstrate that newer, more-powerful hardware does decrease the training time, but with a monetary and power cost that far outpaces the marginal gains in accuracy.

link

2024-09-11

SoK: Security and Privacy Risks of Medical AI

The integration of technology and healthcare has ushered in a new era where software systems, powered by artificial intelligence and machine learning, have become essential components of medical products and services.While these advancements hold great promise for enhancing patient care and healthcare delivery efficiency, they also expose sensitive medical data and system integrity to potential cyberattacks.This paper explores the security and privacy threats posed by AI/ML applications in healthcare. 0.502Through a thorough examination of existing research across a range of medical domains, we have identified significant gaps in understanding the adversarial attacks targeting medical AI systems. 0.553By outlining specific adversarial threat models for medical settings and identifying vulnerable application domains, we lay the groundwork for future research that investigates the security and resilience of AI-driven medical systems. 0.525Through our analysis of different threat models and feasibility studies on adversarial attacks in different medical domains, we provide compelling insights into the pressing need for cybersecurity research in the rapidly evolving field of AI healthcare technology. 0.511

link

2024-09-10

Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks

Understanding how neural networks align with human cognitive processes is a crucial step toward developing more interpretable and reliable AI systems. 0.54Motivated by theories of human cognition, this study examines the relationship between \emph{convexity} in neural network representations and \emph{human-machine alignment} based on behavioral data.We identify a correlation between these two dimensions in pretrained and fine-tuned vision transformer models.Our findings suggest that the convex regions formed in latent spaces of neural networks to some extent align with human-defined categories and reflect the similarity relations humans use in cognitive tasks.While optimizing for alignment generally enhances convexity, increasing convexity through fine-tuning yields inconsistent effects on alignment, which suggests a complex relationship between the two.This study presents a first step toward understanding the relationship between the convexity of latent representations and human-machine alignment.

link

2024-09-10

Improving the Precision of CNNs for Magnetic Resonance Spectral Modeling

Magnetic resonance spectroscopic imaging is a widely available imaging modality that can non-invasively provide a metabolic profile of the tissue of interest, yet is challenging to integrate clinically.One major reason is the expensive, expert data processing and analysis that is required.Using machine learning to predict MRS-related quantities offers avenues around this problem, but deep learning models bring their own challenges, especially model trust. 0.524Current research trends focus primarily on mean error metrics, but comprehensive precision metrics are also needed, e.g. standard deviations, confidence intervals, etc.. This work highlights why more comprehensive error characterization is important and how to improve the precision of CNNs for spectral modeling, a quantitative task.The results highlight advantages and trade-offs of these techniques that should be considered when addressing such regression tasks with CNNs.Detailed insights into the underlying mechanisms of each technique, and how they interact with other techniques, are discussed in depth.

link

2024-09-10

Towards Understanding Human Emotional Fluctuations with Sparse Check-In Data

Data sparsity is a key challenge limiting the power of AI tools across various domains.The problem is especially pronounced in domains that require active user input rather than measurements derived from automated sensors.It is a critical barrier to harnessing the full potential of AI in domains requiring active user engagement, such as self-reported mood check-ins, where capturing a continuous picture of emotional states is essential.In this context, sparse data can hinder efforts to capture the nuances of individual emotional experiences such as causes, triggers, and contributing factors.Existing methods for addressing data scarcity often rely on heuristics or large established datasets, favoring deep learning models that lack adaptability to new domains.This paper proposes a novel probabilistic framework that integrates user-centric feedback-based learning, allowing for personalized predictions despite limited data.Achieving 60% accuracy in predicting user states among 64 options (chance of 1/64), this framework effectively mitigates data sparsity.It is versatile across various applications, bridging the gap between theoretical AI research and practical deployment. 0.503

link

Ethics Research

2024-09-26

Functional Classification of Spiking Signal Data Using Artificial Intelligence Techniques: A Review

Human brain neuron activities are incredibly significant nowadays.Neuronal behavior is assessed by analyzing signal data such as electroencephalography (EEG), which can offer scientists valuable information about diseases and human-computer interaction.One of the difficulties researchers confront while evaluating these signals is the existence of large volumes of spike data.Spikes are some considerable parts of signal data that can happen as a consequence of vital biomarkers or physical issues such as electrode movements.Hence, distinguishing types of spikes is important.From this spot, the spike classification concept commences.Previously, researchers classified spikes manually.The manual classification was not precise enough as it involves extensive analysis.Consequently, Artificial Intelligence (AI) was introduced into neuroscience to assist clinicians in classifying spikes correctly. 0.501This review discusses the importance and use of AI in spike classification, focusing on the recognition of neural activity noises.The task is divided into three main components: preprocessing, classification, and evaluation.Existing methods are introduced and their importance is determined.The review also highlights the need for more efficient algorithms.The primary goal is to provide a perspective on spike classification for future research and provide a comprehensive understanding of the methodologies and issues involved.The review organizes materials in the spike classification field for future studies.In this work, numerous studies were extracted from different databases.The PRISMA-related research guidelines were then used to choose papers.Then, research studies based on spike classification using machine learning and deep learning approaches with effective preprocessing were selected.

link

2024-09-26

Software for the SpaceDREAM Robotic Arm

Impedance-controlled robots are widely used on Earth to perform interaction-rich tasks and will be a key enabler for In-Space Servicing, Assembly and Manufacturing (ISAM) activities.This paper introduces the software architecture used on the On-Board Computer (OBC) for the planned SpaceDREAM mission aiming to validate such robotic arm in Lower Earth Orbit (LEO) conducted by the German Aerospace Center (DLR) in cooperation with KINETIK Space GmbH and the Technical University of Munich (TUM).During the mission several free motion as well as contact tasks are to be performed in order to verify proper functionality of the robot in position and impedance control on joint level as well as in cartesian control.The tasks are selected to be representative for subsequent servicing missions e.g. requiring interface docking or precise manipulation. The software on the OBC commands the robot's joints via SpaceWire to perform those mission tasks, reads camera images and data from additional sensors and sends telemetry data through an Ethernet link via the spacecraft down to Earth.It is set up to execute a predefined mission after receiving a start signal from the spacecraft while it should be extendable to receive commands from Earth for later missions.Core design principle was to reuse as much existing software and to stay as close as possible to existing robot software stacks at DLR.This allowed for a quick full operational start of the robot arm compared to a custom development of all robot software, a lower entry barrier for software developers as well as a reuse of existing libraries. 0.504While not every line of code can be tested with this design, most of the software has already proven its functionality through daily execution on multiple robot systems. 0.501

link

2024-09-26

Dr. GPT in Campus Counseling: Understanding Higher Education Students' Opinions on LLM-assisted Mental Health Services

In response to the increasing mental health challenges faced by college students, we sought to understand their perspectives on how AI applications, particularly Large Language Models (LLMs), can be leveraged to enhance their mental well-being. 0.547Through pilot interviews with ten diverse students, we explored their opinions on the use of LLMs across five fictional scenarios: General Information Inquiry, Initial Screening, Reshaping Patient-Expert Dynamics, Long-term Care, and Follow-up Care.Our findings revealed that students' acceptance of LLMs varied by scenario, with participants highlighting both potential benefits, such as proactive engagement and personalized follow-up care, and concerns, including limitations in training data and emotional support.These insights inform how AI technology should be designed and implemented to effectively support and enhance students' mental well-being, particularly in scenarios where LLMs can complement traditional methods, while maintaining empathy and respecting individual preferences. 0.643

link

2024-09-26

Expanding Perspectives on Data Privacy: Insights from Rural Togo

Passively collected "big" data sources are increasingly used to inform critical development policy decisions in low- and middle-income countries.While prior work highlights how such approaches may reveal sensitive information, enable surveillance, and centralize power, less is known about the corresponding privacy concerns, hopes, and fears of the people directly impacted by these policies -- people sometimes referred to as experiential experts. 0.537To understand the perspectives of experiential experts, we conducted semi-structured interviews with people living in rural villages in Togo shortly after an entirely digital cash transfer program was launched that used machine learning and mobile phone metadata to determine program eligibility.This paper documents participants' privacy concerns surrounding the introduction of big data approaches in development policy. 0.507We find that the privacy concerns of our experiential experts differ from those raised by privacy and development domain experts. 0.511To facilitate a more robust and constructive account of privacy, we discuss implications for policies and designs that take seriously the privacy concerns raised by both experiential experts and domain experts. 0.522

link

2024-09-26

Open Digital Rights Enforcement Framework (ODRE): from descriptive to enforceable policies

From centralised platforms to decentralised ecosystems, like Data Spaces, sharing data has become a paramount challenge. 0.525For this reason, the definition of data usage policies has become crucial in these domains, highlighting the necessity of effective policy enforcement mechanisms.The Open Digital Rights Language (ODRL) is a W3C standard ontology designed to describe data usage policies, however, it lacks built-in enforcement capabilities, limiting its practical application.This paper introduces the Open Digital Rights Enforcement (ODRE) framework, whose goal is to provide ODRL with enforcement capabilities. 0.57The ODRE framework proposes a novel approach to express ODRL policies that integrates the descriptive ontology terms of ODRL with other languages that allow behaviour specification, such as dynamic data handling or function evaluation.The framework includes an enforcement algorithm for ODRL policies and two open-source implementations in Python and Java.The ODRE framework is also designed to support future extensions of ODRL to specific domain scenarios.In addition, current limitations of ODRE, ODRL, and current challenges are reported.Finally, to demonstrate the enforcement capabilities of the implementations, their performance, and their extensibility features, several experiments have been carried out with positive results.

link

2024-09-26

Learning Occlusion-aware Decision-making from Agent Interaction via Active Perception

Occlusion-aware decision-making is essential in autonomous driving due to the high uncertainty of various occlusions.Recent occlusion-aware decision-making methods encounter issues such as high computational complexity, scenario scalability challenges, or reliance on limited expert data.Benefiting from automatically generating data by exploration randomization, we uncover that reinforcement learning (RL) may show promise in occlusion-aware decision-making.However, previous occlusion-aware RL faces challenges in expanding to various dynamic and static occlusion scenarios, low learning efficiency, and lack of predictive ability.To address these issues, we introduce Pad-AI, a self-reinforcing framework to learn occlusion-aware decision-making through active perception.Pad-AI utilizes vectorized representation to represent occluded environments efficiently and learns over the semantic motion primitives to focus on high-level active perception exploration.Furthermore, Pad-AI integrates prediction and RL within a unified framework to provide risk-aware learning and security guarantees. 0.546Our framework was tested in challenging scenarios under both dynamic and static occlusions and demonstrated efficient and general perception-aware exploration performance to other strong baselines in closed-loop evaluations.

link

2024-09-26

Digital Twin Ecosystem for Oncology Clinical Operations

Artificial Intelligence (AI) and Large Language Models (LLMs) hold significant promise in revolutionizing healthcare, especially in clinical applications.Simultaneously, Digital Twin technology, which models and simulates complex systems, has gained traction in enhancing patient care.However, despite the advances in experimental clinical settings, the potential of AI and digital twins to streamline clinical operations remains largely untapped. 0.583This paper introduces a novel digital twin framework specifically designed to enhance oncology clinical operations.We propose the integration of multiple specialized digital twins, such as the Medical Necessity Twin, Care Navigator Twin, and Clinical History Twin, to enhance workflow efficiency and personalize care for each patient based on their unique data.Furthermore, by synthesizing multiple data sources and aligning them with the National Comprehensive Cancer Network (NCCN) guidelines, we create a dynamic Cancer Care Path, a continuously evolving knowledge base that enables these digital twins to provide precise, tailored clinical recommendations.

link

2024-09-26

A Fuzzy-based Approach to Predict Human Interaction by Functional Near-Infrared Spectroscopy

The paper introduces a Fuzzy-based Attention (Fuzzy Attention Layer) mechanism, a novel computational approach to enhance the interpretability and efficacy of neural models in psychological research.The proposed Fuzzy Attention Layer mechanism is integrated as a neural network layer within the Transformer Encoder model to facilitate the analysis of complex psychological phenomena through neural signals, such as those captured by functional Near-Infrared Spectroscopy (fNIRS).By leveraging fuzzy logic, the Fuzzy Attention Layer is capable of learning and identifying interpretable patterns of neural activity.This capability addresses a significant challenge when using Transformer: the lack of transparency in determining which specific brain activities most contribute to particular predictions.Our experimental results demonstrated on fNIRS data from subjects engaged in social interactions involving handholding reveal that the Fuzzy Attention Layer not only learns interpretable patterns of neural activity but also enhances model performance.Additionally, the learned patterns provide deeper insights into the neural correlates of interpersonal touch and emotional exchange.The application of our model shows promising potential in deciphering the subtle complexities of human social behaviors, thereby contributing significantly to the fields of social neuroscience and psychological AI. 0.665

link

2024-09-26

Confidence intervals uncovered: Are we ready for real-world medical imaging AI?

Medical imaging is spearheading the AI transformation of healthcare. 0.649Performance reporting is key to determine which methods should be translated into clinical practice.Frequently, broad conclusions are simply derived from mean performance values.In this paper, we argue that this common practice is often a misleading simplification as it ignores performance variability.Our contribution is threefold.(1) Analyzing all MICCAI segmentation papers (n = 221) published in 2023, we first observe that more than 50\% of papers do not assess performance variability at all.Moreover, only one (0.5\%) paper reported confidence intervals (CIs) for model performance.(2) To address the reporting bottleneck, we show that the unreported standard deviation (SD) in segmentation papers can be approximated by a second-order polynomial function of the mean Dice similarity coefficient (DSC).Based on external validation data from 56 previous MICCAI challenges, we demonstrate that this approximation can accurately reconstruct the CI of a method using information provided in publications.(3) Finally, we reconstructed 95\% CIs around the mean DSC of MICCAI 2023 segmentation papers.The median CI width was 0.03 which is three times larger than the median performance gap between the first and second ranked method.For more than 60\% of papers, the mean performance of the second-ranked method was within the CI of the first-ranked method.We conclude that current publications typically do not provide sufficient evidence to support which models could potentially be translated into clinical practice.

link

2024-09-26

DREAMS: A python framework to train deep learning models with model card reporting for medical and health applications

Electroencephalography (EEG) data provides a non-invasive method for researchers and clinicians to observe brain activity in real time.The integration of deep learning techniques with EEG data has significantly improved the ability to identify meaningful patterns, leading to valuable insights for both clinical and research purposes.However, most of the frameworks so far, designed for EEG data analysis, are either too focused on pre-processing or in deep learning methods per, making their use for both clinician and developer communities problematic.Moreover, critical issues such as ethical considerations, biases, uncertainties, and the limitations inherent in AI models for EEG data analysis are frequently overlooked, posing challenges to the responsible implementation of these technologies. 0.558In this paper, we introduce a comprehensive deep learning framework tailored for EEG data processing, model training and report generation.While constructed in way to be adapted and developed further by AI developers, it enables to report, through model cards, the outcome and specific information of use for both developers and clinicians. 0.639In this way, we discuss how this framework can, in the future, provide clinical researchers and developers with the tools needed to create transparent and accountable AI models for EEG data analysis and diagnosis. 0.503

link

2024-09-26

Hypergame Theory for Decentralized Resource Allocation in Multi-user Semantic Communications

Semantic communications (SC) is an emerging communication paradigm in which wireless devices can send only relevant information from a source of data while relying on computing resources to regenerate missing data points.However, the design of a multi-user SC system becomes more challenging because of the computing and communication overhead required for coordination.Existing solutions for learning the semantic language and performing resource allocation often fail to capture the computing and communication tradeoffs involved in multiuser SC.To address this gap, a novel framework for decentralized computing and communication resource allocation in multiuser SC systems is proposed.The challenge of efficiently allocating communication and computing resources (for reasoning) in a decentralized manner to maximize the quality of task experience for the end users is addressed through the application of Stackelberg hyper game theory.Leveraging the concept of second-level hyper games, novel analytical formulations are developed to model misperceptions of the users about each other's communication and control strategies. 0.526Further, equilibrium analysis of the learned resource allocation protocols examines the convergence of the computing and communication strategies to a local Stackelberg equilibria, considering misperceptions.Simulation results show that the proposed Stackelberg hyper game results in efficient usage of communication and computing resources while maintaining a high quality of experience for the users compared to state-of-the-art that does not account for the misperceptions.

link

2024-09-26

Explaining Explaining

Explanation is key to people having confidence in high-stakes AI systems. 0.691However, machine-learning-based systems - which account for almost all current AI - can't explain because they are usually black boxes. 0.639The explainable AI (XAI) movement hedges this problem by redefining "explanation". 0.57The human-centered explainable AI (HCXAI) movement identifies the explanation-oriented needs of users but can't fulfill them because of its commitment to machine learning. 0.652In order to achieve the kinds of explanations needed by real people operating in critical domains, we must rethink how to approach AI. 0.731We describe a hybrid approach to developing cognitive agents that uses a knowledge-based infrastructure supplemented by data obtained through machine learning when applicable.These agents will serve as assistants to humans who will bear ultimate responsibility for the decisions and actions of the human-robot team. 0.62We illustrate the explanatory potential of such agents using the under-the-hood panels of a demonstration system in which a team of simulated robots collaborates on a search task assigned by a human. 0.517

link

2024-09-26

Visual Data Diagnosis and Debiasing with Concept Graphs

The widespread success of deep learning models today is owed to the curation of extensive datasets significant in size and complexity.However, such models frequently pick up inherent biases in the data during the training process, leading to unreliable predictions.Diagnosing and debiasing datasets is thus a necessity to ensure reliable model performance.In this paper, we present CONBIAS, a novel framework for diagnosing and mitigating Concept co-occurrence Biases in visual datasets.CONBIAS represents visual datasets as knowledge graphs of concepts, enabling meticulous analysis of spurious concept co-occurrences to uncover concept imbalances across the whole dataset.Moreover, we show that by employing a novel clique-based concept balancing strategy, we can mitigate these imbalances, leading to enhanced performance on downstream tasks.Extensive experiments show that data augmentation based on a balanced concept distribution augmented by CONBIAS improves generalization performance across multiple datasets compared to state-of-the-art methods.We will make our code and data publicly available. 0.551

link

2024-09-26

AI-Powered Augmented Reality for Satellite Assembly, Integration and Test

The integration of Artificial Intelligence (AI) and Augmented Reality (AR) is set to transform satellite Assembly, Integration, and Testing (AIT) processes by enhancing precision, minimizing human error, and improving operational efficiency in cleanroom environments. 0.546This paper presents a technical description of the European Space Agency's (ESA) project "AI for AR in Satellite AIT," which combines real-time computer vision and AR systems to assist technicians during satellite assembly.Leveraging Microsoft HoloLens 2 as the AR interface, the system delivers context-aware instructions and real-time feedback, tackling the complexities of object recognition and 6D pose estimation in AIT workflows.All AI models demonstrated over 70% accuracy, with the detection model exceeding 95% accuracy, indicating a high level of performance and reliability. 0.543A key contribution of this work lies in the effective use of synthetic data for training AI models in AR applications, addressing the significant challenges of obtaining real-world datasets in highly dynamic satellite environments, as well as the creation of the Segmented Anything Model for Automatic Labelling (SAMAL), which facilitates the automatic annotation of real data, achieving speeds up to 20 times faster than manual human annotation.The findings demonstrate the efficacy of AI-driven AR systems in automating critical satellite assembly tasks, setting a foundation for future innovations in the space industry. 0.539

link

2024-09-26

Slowly Scaling Per-Record Differential Privacy

We develop formal privacy mechanisms for releasing statistics from data with many outlying values, such as income data.These mechanisms ensure that a per-record differential privacy guarantee degrades slowly in the protected records' influence on the statistics being released. Formal privacy mechanisms generally add randomness, or "noise," to published statistics.If a noisy statistic's distribution changes little with the addition or deletion of a single record in the underlying dataset, an attacker looking at this statistic will find it plausible that any particular record was present or absent, preserving the records' privacy.More influential records -- those whose addition or deletion would change the statistics' distribution more -- typically suffer greater privacy loss.The per-record differential privacy framework quantifies these record-specific privacy guarantees, but existing mechanisms let these guarantees degrade rapidly (linearly or quadratically) with influence.While this may be acceptable in cases with some moderately influential records, it results in unacceptably high privacy losses when records' influence varies widely, as is common in economic data. We develop mechanisms with privacy guarantees that instead degrade as slowly as logarithmically with influence. 0.502These mechanisms allow for the accurate, unbiased release of statistics, while providing meaningful protection for highly influential records.As an example, we consider the private release of sums of unbounded establishment data such as payroll, where our mechanisms extend meaningful privacy protection even to very large establishments.We evaluate these mechanisms empirically and demonstrate their utility.

link

2024-09-25

Robo-Platform: A Robotic System for Recording Sensors and Controlling Robots

Mobile smartphones compactly provide sensors such as cameras, IMUs, GNSS measurement units, and wireless and wired communication channels required for robotics projects.They are affordable, portable, and programmable, which makes them ideal for testing, data acquisition, controlling mobile robots, and many other robotic applications.A robotic system is proposed in this paper, consisting of an Android phone, a microcontroller board attached to the phone via USB, and a remote wireless controller station. 0.527In the data acquisition mode, the Android device can record a dataset of a diverse configuration of multiple cameras, IMUs, GNSS units, and external USB ADC channels in the rawest format used for, but not limited to, pose estimation and scene reconstruction applications.In robot control mode, the Android phone, a microcontroller board, and other peripherals constitute the mobile or stationary robotic system.This system is controlled using a remote server connected over Wi-Fi or Bluetooth.Experiments show that although the SLAM and AR applications can utilize the acquired data, the proposed system can pave the way for more advanced algorithms for processing these noisy and sporadic measurements.Moreover, the characteristics of the communication media are studied, and two example robotic projects, which involve controlling a toy car and a quadcopter, are included. 0.512

link

2024-09-25

Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics

Explainable AI (XAI) is a rapidly growing domain with a myriad of proposed methods as well as metrics aiming to evaluate their efficacy. 0.668However, current studies are often of limited scope, examining only a handful of XAI methods and ignoring underlying design parameters for performance, such as the model architecture or the nature of input data.Moreover, they often rely on one or a few metrics and neglect thorough validation, increasing the risk of selection bias and ignoring discrepancies among metrics.These shortcomings leave practitioners confused about which method to choose for their problem.In response, we introduce LATEC, a large-scale benchmark that critically evaluates 17 prominent XAI methods using 20 distinct metrics.We systematically incorporate vital design parameters like varied architectures and diverse input modalities, resulting in 7,560 examined combinations.Through LATEC, we showcase the high risk of conflicting metrics leading to unreliable rankings and consequently propose a more robust evaluation scheme.Further, we comprehensively evaluate various XAI methods to assist practitioners in selecting appropriate methods aligning with their needs.Curiously, the emerging top-performing method, Expected Gradients, is not examined in any relevant related study.LATEC reinforces its role in future XAI research by publicly releasing all 326k saliency maps and 378k metric scores as a (meta-)evaluation dataset.

link

2024-09-25

Enhancing Feature Selection and Interpretability in AI Regression Tasks Through Feature Attribution

Research in Explainable Artificial Intelligence (XAI) is increasing, aiming to make deep learning models more transparent. 0.596Most XAI methods focus on justifying the decisions made by Artificial Intelligence (AI) systems in security-relevant applications. 0.655However, relatively little attention has been given to using these methods to improve the performance and robustness of deep learning algorithms.Additionally, much of the existing XAI work primarily addresses classification problems.In this study, we investigate the potential of feature attribution methods to filter out uninformative features in input data for regression problems, thereby improving the accuracy and stability of predictions.We introduce a feature selection pipeline that combines Integrated Gradients with k-means clustering to select an optimal set of variables from the initial data space.To validate the effectiveness of this approach, we apply it to a real-world industrial problem - blade vibration analysis in the development process of turbo machinery.

link

2024-09-25

Automating Traffic Model Enhancement with AI Research Agent

Developing efficient traffic models is essential for optimizing transportation systems, yet current approaches remain time-intensive and susceptible to human errors due to their reliance on manual processes.Traditional workflows involve exhaustive literature reviews, formula optimization, and iterative testing, leading to inefficiencies in research.In response, we introduce the Traffic Research Agent (TR-Agent), an AI-driven system designed to autonomously develop and refine traffic models through an iterative, closed-loop process. 0.54Specifically, we divide the research pipeline into four key stages: idea generation, theory formulation, theory evaluation, and iterative optimization; and construct TR-Agent with four corresponding modules: Idea Generator, Code Generator, Evaluator, and Analyzer.Working in synergy, these modules retrieve knowledge from external resources, generate novel ideas, implement and debug models, and finally assess them on the evaluation datasets.Furthermore, the system continuously refines these models based on iterative feedback, enhancing research efficiency and model performance.Experimental results demonstrate that TR-Agent achieves significant performance improvements across multiple traffic models, including the Intelligent Driver Model (IDM) for car following, the MOBIL lane-changing model, and the Lighthill-Whitham-Richards (LWR) traffic flow model.Additionally, TR-Agent provides detailed explanations for its optimizations, allowing researchers to verify and build upon its improvements easily.This flexibility makes the framework a powerful tool for researchers in transportation and beyond.To further support research and collaboration, we have open-sourced both the code and data used in our experiments, facilitating broader access and enabling continued advancements in the field.

link

2024-09-25

Multi-Robot Informative Path Planning for Efficient Target Mapping using Deep Reinforcement Learning

Autonomous robots are being employed in several mapping and data collection tasks due to their efficiency and low labor costs.In these tasks, the robots are required to map targets-of-interest in an unknown environment while constrained to a given resource budget such as path length or mission time.This is a challenging problem as each robot has to not only detect and avoid collisions from static obstacles in the environment but also has to model other robots' trajectories to avoid inter-robot collisions.We propose a novel deep reinforcement learning approach for multi-robot informative path planning to map targets-of-interest in an unknown 3D environment.A key aspect of our approach is an augmented graph that models other robots' trajectories to enable planning for communication and inter-robot collision avoidance. 0.548We train our decentralized reinforcement learning policy via the centralized training and decentralized execution paradigm.Once trained, our policy is also scalable to varying number of robots and does not require re-training.Our approach outperforms other state-of-the-art multi-robot target mapping approaches by 33.75% in terms of the number of discovered targets-of-interest.We open-source our code and model at: https://github.com/AccGen99/marl_ipp

link

2024-09-25

Towards User-Focused Research in Training Data Attribution for Human-Centered Explainable AI

While Explainable AI (XAI) aims to make AI understandable and useful to humans, it has been criticised for relying too much on formalism and solutionism, focusing more on mathematical soundness than user needs. 0.679We propose an alternative to this bottom-up approach inspired by design thinking: the XAI research community should adopt a top-down, user-focused perspective to ensure user relevance.We illustrate this with a relatively young subfield of XAI, Training Data Attribution (TDA).With the surge in TDA research and growing competition, the field risks repeating the same patterns of solutionism.We conducted a needfinding study with a diverse group of AI practitioners to identify potential user needs related to TDA. 0.514Through interviews (N=10) and a systematic survey (N=31), we uncovered new TDA tasks that are currently largely overlooked.We invite the TDA and XAI communities to consider these novel tasks and improve the user relevance of their research outcomes.

link

2024-09-25

Models Can and Should Embrace the Communicative Nature of Human-Generated Math

Math is constructed by people for people: just as natural language corpora reflect not just propositions but the communicative goals of language users, the math data that models are trained on reflects not just idealized mathematical entities but rich communicative intentions.While there are important advantages to treating math in a purely symbolic manner, we here hypothesize that there are benefits to treating math as situated linguistic communication and that language models are well suited for this goal, in ways that are not fully appreciated.We illustrate these points with two case studies.First, we ran an experiment in which we found that language models interpret the equals sign in a humanlike way -- generating systematically different word problems for the same underlying equation arranged in different ways.Second, we found that language models prefer proofs to be ordered in naturalistic ways, even though other orders would be logically equivalent.We advocate for AI systems that learn from and represent the communicative intentions latent in human-generated math. 0.548

link

2024-09-25

Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts

Prototyping complex computer-aided design (CAD) models in modern softwares can be very time-consuming.This is due to the lack of intelligent systems that can quickly generate simpler intermediate parts.We propose Text2CAD, the first AI framework for generating text-to-parametric CAD models using designer-friendly instructions for all skill levels.Furthermore, we introduce a data annotation pipeline for generating text prompts based on natural language instructions for the DeepCAD dataset using Mistral and LLaVA-NeXT.The dataset contains $\sim170$K models and $\sim660$K text annotations, from abstract CAD descriptions (e.g., generate two concentric cylinders) to detailed specifications (e.g., draw two circles with center $(x,y)$ and radius $r_{1}$, $r_{2}$, and extrude along the normal by $d$...).Within the Text2CAD framework, we propose an end-to-end transformer-based auto-regressive network to generate parametric CAD models from input texts.We evaluate the performance of our model through a mixture of metrics, including visual quality, parametric precision, and geometrical accuracy.Our proposed framework shows great potential in AI-aided design applications. 0.509Our source code and annotations will be publicly available.

link

2024-09-25

Small data deep learning methodology for in-field disease detection

Early detection of diseases in crops is essential to prevent harvest losses and improve the quality of the final product.In this context, the combination of machine learning and proximity sensors is emerging as a technique capable of achieving this detection efficiently and effectively.For example, this machine learning approach has been applied to potato crops -- to detect late blight (Phytophthora infestans) -- and grapevine crops -- to detect downy mildew.However, most of these AI models found in the specialised literature have been developed using leaf-by-leaf images taken in the lab, which does not represent field conditions and limits their applicability. 0.538In this study, we present the first machine learning model capable of detecting mild symptoms of late blight in potato crops through the analysis of high-resolution RGB images captured directly in the field, overcoming the limitations of other publications in the literature and presenting real-world applicability.Our proposal exploits the availability of high-resolution images via the concept of patching, and is based on deep convolutional neural networks with a focal loss function, which makes the model to focus on the complex patterns that arise in field conditions.Additionally, we present a data augmentation scheme that facilitates the training of these neural networks with few high-resolution images, which allows for development of models under the small data paradigm. Our model correctly detects all cases of late blight in the test dataset, demonstrating a high level of accuracy and effectiveness in identifying early symptoms.These promising results reinforce the potential use of machine learning for the early detection of diseases and pests in agriculture, enabling better treatment and reducing their impact on crops.

link

2024-09-24

Enhancing IoT based Plant Health Monitoring through Advanced Human Plant Interaction using Large Language Models and Mobile Applications

This paper presents the development of a novel plant communication application that allows plants to "talk" to humans using real-time sensor data and AI-powered language models. 0.515Utilizing soil sensors that track moisture, temperature, and nutrient levels, the system feeds this data into the Gemini API, where it is processed and transformed into natural language insights about the plant's health and "mood."Developed using Flutter, Firebase, and ThingSpeak, the app offers a seamless user experience with real-time interaction capabilities.By fostering human-plant connectivity, this system enhances plant care practices, promotes sustainability, and introduces innovative applications for AI and IoT technologies in both personal and agricultural contexts. 0.557The paper explores the technical architecture, system integration, and broader implications of AI-driven plant communication. 0.665

link

2024-09-24

Toward Scalable and Efficient Visual Data Transmission in 6G Networks

6G network technology will emerge in a landscape where visual data transmissions dominate global mobile traffic and are expected to grow continuously, driven by the increasing demand for AI-based computer vision applications.This will make already challenging task of visual data transmission even more difficult.In this work, we review effective techniques for visual data transmission, such as content compression and adaptive video streaming, highlighting their advantages and limitations.Further, considering the scalability and cost issues of cloud-based and on-device AI services, we explore distributed in-network computing architecture like fog-computing as a direction of 6G networks, and investigate the necessary technical properties for the timely delivery of visual data. 0.502

link

2024-09-24

The Digital Transformation in Health: How AI Can Improve the Performance of Health Systems

Mobile health has the potential to revolutionize health care delivery and patient engagement.In this work, we discuss how integrating Artificial Intelligence into digital health applications-focused on supply chain, patient management, and capacity building, among other use cases-can improve the health system and public health performance. 0.598We present an Artificial Intelligence and Reinforcement Learning platform that allows the delivery of adaptive interventions whose impact can be optimized through experimentation and real-time monitoring.The system can integrate multiple data sources and digital health applications.The flexibility of this platform to connect to various mobile health applications and digital devices and send personalized recommendations based on past data and predictions can significantly improve the impact of digital tools on health system outcomes.The potential for resource-poor settings, where the impact of this approach on health outcomes could be more decisive, is discussed specifically.This framework is, however, similarly applicable to improving efficiency in health systems where scarcity is not an issue.

link

2024-09-24

The anonymization problem in social networks

In this paper we introduce a general version of the anonymization problem in social networks, in which the goal is to maximize the number of anonymous nodes by altering a given graph.We define three variants of this optimization problem, being full, partial and budgeted anonymization.In each, the objective is to maximize the number of k-anonymous nodes, i.e., nodes for which there are at least k-1 equivalent nodes, according to a particular anonymity measure of structural node equivalence.We propose six new heuristic algorithms for solving the anonymization problem which we implement into the reusable ANO-NET computational framework.As a baseline, we use an edge sampling method introduced in previous work.Experiments on both graph models and 17 real-world network datasets result in three empirical findings.First, we demonstrate that edge deletion is the most effective graph alteration operation.Second, we compare four commonly used anonymity measures from the literature and highlight how the choice of anonymity measure has a tremendous effect on both the achieved anonymity as well as the difficulty of solving the anonymization problem.Third, we find that the proposed algorithms that preferentially delete edges with a larger effect on nodes at a structurally unique position consistently outperform heuristics solely based on network structure.With similar runtimes, our algorithms retain on average 17 times more edges, ensuring higher data utility after full anonymization.In the budgeted variant, they achieve 4.4 times more anonymous nodes than the baseline.This work lays important foundations for future development of algorithms for anonymizing social networks. 0.509

link

2024-09-24

Cyber Knowledge Completion Using Large Language Models

The integration of the Internet of Things (IoT) into Cyber-Physical Systems (CPSs) has expanded their cyber-attack surface, introducing new and sophisticated threats with potential to exploit emerging vulnerabilities. 0.526Assessing the risks of CPSs is increasingly difficult due to incomplete and outdated cybersecurity knowledge.This highlights the urgent need for better-informed risk assessments and mitigation strategies.While previous efforts have relied on rule-based natural language processing (NLP) tools to map vulnerabilities, weaknesses, and attack patterns, recent advancements in Large Language Models (LLMs) present a unique opportunity to enhance cyber-attack knowledge completion through improved reasoning, inference, and summarization capabilities.We apply embedding models to encapsulate information on attack patterns and adversarial techniques, generating mappings between them using vector embeddings.Additionally, we propose a Retrieval-Augmented Generation (RAG)-based approach that leverages pre-trained models to create structured mappings between different taxonomies of threat patterns.Further, we use a small hand-labeled dataset to compare the proposed RAG-based approach to a baseline standard binary classification model.Thus, the proposed approach provides a comprehensive framework to address the challenge of cyber-attack knowledge graph completion.

link

2024-09-24

Context-Based Meta Reinforcement Learning for Robust and Adaptable Peg-in-Hole Assembly Tasks

Peg-in-hole assembly in unknown environments is a challenging task due to onboard sensor errors, which result in uncertainty and variations in task parameters such as the hole position and orientation.Meta Reinforcement Learning (Meta RL) has been proposed to mitigate this problem as it learns how to quickly adapt to new tasks with different parameters.However, previous approaches either depend on a sample-inefficient procedure or human demonstrations to perform the task in the real world.Our work modifies the data used by the Meta RL agent and uses simple features that can be easily measured in the real world even with an uncalibrated camera.We further adapt the Meta RL agent to use data from a force/torque sensor, instead of the camera, to perform the assembly, using a small amount of training data.Finally, we propose a fine-tuning method that consistently and safely adapts to out-of-distribution tasks with parameters that differ by a factor of 10 from the training tasks.Our results demonstrate that the proposed data modification significantly enhances the training and adaptation efficiency and enables the agent to achieve 100% success in tasks with different hole positions and orientations.Experiments on a real robot confirm that both camera- and force/torque sensor-equipped agents achieve 100% success in tasks with unknown hole positions, matching their simulation performance and validating the approach's robustness and applicability. 0.506Compared to the previous work with sample-inefficient adaptation, our proposed methods are 10 times more sample-efficient in the real-world tasks.

link