arXiv Research Digest

April 13, 2026 • 125 papers across 5 interests

🔬

Efficient ML / Edge AI

🟢 Applied

AsymLoc: Towards Asymmetric Feature Matching for Efficient Visual Localization

💡 This research reduces computer vision.

AsymLoc is a novel distillation framework that aligns a Student to its Teacher through a combination of a geometry-driven matching objective and a joint detector-descriptor distillation objective, enabling fast, parameter-less nearest-neighbor matching . It achieves up to 95% of the teacher's localization accuracy using a order of magnitude smaller models .

Abstract ↗ PDF ↗

🟢 Applied

UIPress: Bringing Optical Token Compression to UI-to-Code Generation

💡 This research making models smaller for language AI.

UI-to-Code generation requires vision-language models to produce thousands of tokens of structured HTML/CSS from a single screenshot . Existing compression methods either select tokens at inference time using task-agnostic heuristics, or zero out low-attention features without actually shortening the sequence .

Abstract ↗ PDF ↗

🟢 Applied

Tango: Taming Visual Signals for Efficient Video Large Language Models

💡 This research makes more efficient language AI.

Token pruning has emerged as a mainstream approach for developing efficient Video Large Language Models . This work revisits and advances the two predominant token-pruning paradigms: attention-based selection and similarity-based clustering . We propose Tango, a novel framework designed to optimize the utilization of visual signals .

Abstract ↗ PDF ↗

🟢 Applied

ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion

💡 This research faster predictions in language AI.

Chest X-ray report generation (CXR-RG) has potential to substantially alleviate radiologists' workload . But conventional autoregressive vision--language models suffer from high inference latency due to sequential token decoding . Diffusion-based models offer promising alternative through parallel generation, but they still require multiple denoising iterations .

Abstract ↗ PDF ↗

🟢 Applied

Integrated electro-optic attention nonlinearities for transformers

💡 This research achieves better language AI.

Transformers have emerged as the dominant neural-network architecture, achieving state-of-the-art performance in language processing and computer vision . Softmax operations account for less than 1% of the total operation count, but they can disproportionately bottleneck overall inference latency . We use thin-film lithium niobate (TFLN) Mach-Zehnder modulators as analog nonlinear computational elements to drastically reduce the latency of nonlinear computations .

Abstract ↗ PDF ↗

🟢 Applied

BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

💡 This research creating new content with language AI.

BERT-as-a-Judge is an encoder-driven approach for assessing answer correctness in reference-based generative settings . It is robust to variations in output phrasing and requires only lightweight training on synthetically annotated question-candidate-reference triplets . It consistently outperforms lexical baseline while matching performance of much larger LLM judges .

Abstract ↗ PDF ↗

🟢 Applied

Online3R: Online Learning for Consistent Sequential Reconstruction Based on Geometry Foundation Model

💡 This research presents techniques for computer vision.

Online3R is a new sequential reconstruction framework that is capable of adapting to new scenes through online learning, effectively resolving inconsistency issues . We introduce a set of learnable lightweight visual prompts into a pretrained, frozen geometry foundation model to capture the knowledge of new environments .

Abstract ↗ PDF ↗

🟢 Applied

OASIS: Online Activation Subspace Learning for Memory-Efficient Training

💡 This research reduces language AI.

Training large language models is constrained by memory requirements . OASIS is an online activation subspace learning algorithm for memory-efficient training . Intermediate activations are projected onto this evolving subspace, reducing memory without modifying forward pass computations .

Abstract ↗ PDF ↗

🟢 Applied

UHD Low-Light Image Enhancement via Real-Time Enhancement Methods with Clifford Information Fusion

💡 This research achieves better computer vision.

Existing methods based on Transformer architecture or complex convolutional neural networks often suffer from the "memory wall" bottleneck . We propose novel real-time UHD low-light enhancement network based on geometric feature fusion using Clifford algebra in 2D Euclidean space .

Abstract ↗ PDF ↗

🟢 Applied

Efficient Unlearning through Maximizing Relearning Convergence Delay

💡 This research tackles the problem of machine learning.

Machine unlearning poses challenges in removing mislabeled, contaminated data from a pretrained model . New metric called relearning convergence delay captures changes in weight space and prediction space . This metric can be used to assess the risk of forgotten data being recovered from the unlearned model .

Abstract ↗ PDF ↗

🟢 Applied

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

💡 This research explores techniques in language AI.

Large language models undergo alignment training to avoid harmful behaviors, yet the resulting safeguards remain brittle . Fine-tuning on narrow domains can induce emergent misalignment that generalizes broadly . Aligned models exhibit a greater compression of harm generation weights than unaligned counterparts .

Abstract ↗ PDF ↗

🟢 Applied

Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise

💡 This research makes more efficient language AI.

Prompt learning is a parameter-efficient approach for vision-language models, yet its robustness under label noise is less investigated . Visual content contains richer and more reliable semantic information, but the prompt itself is highly susceptible to label noise . We propose VisPrompt, a lightweight and robust vision-guided prompt learning framework for noisy-label settings .

Abstract ↗ PDF ↗

🟢 Applied

Envisioning the Future, One Step at a Time

💡 This research makes more efficient computer vision.

Autoregressive diffusion model advances trajectories through short, locally predictable transitions, explicitly modeling the growth of uncertainty over time . This dynamics-centric representation enables fast rollout of thousands of diverse futures from a single image, while maintaining physical plausibility and long-range coherence .

Abstract ↗ PDF ↗

🟢 Applied

Sim-to-Real Transfer for Muscle-Actuated Robots via Generalized Actuator Networks

💡 This research speeds up edge computing.

Tendon drives paired with soft muscle actuation enable faster and safer robots . Still, these systems are rarely used in practice due to inherent nonlinearities, friction and hysteresis . So far, these challenges have hindered policy transfer from simulation to real systems .

Abstract ↗ PDF ↗

🟢 Applied

Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement

💡 This research achieves better computer vision.

Immersive Volumetric Videos is a new volumetric media format designed to provide large 6-DoF interaction spaces, audiovisual feedback, and high-resolution, high-frame-rate dynamic content . ImViD is a multi-view, multi-modal dataset built upon a space-oriented capture philosophy .

Abstract ↗ PDF ↗

🟢 Applied

EGLOCE: Training-Free Energy-Guided Latent Optimization for Concept Erasure

💡 This research explores techniques in computer vision.

Energy-Guided Latent Optimization for Concept Erasure (EGLOCE) removes unwanted concepts by re-directing noisy latent during inference . EGLOCE improves concept removal while maintaining image quality and prompt alignment .

Abstract ↗ PDF ↗

🟡 Advanced

Variational Quantum Physics-Informed Neural Networks for Hydrological PDE-Constrained Learning with Inherent Uncertainty Quantification

💡 This research proposes a method for machine learning.

Hybrid Quantum-Classical Physics-Informed Neural Network (HQC-PINN) integrates quantum quantum circuits into the PINN framework for hydrological PDE-constrained learning . The inherent stochasticity of quantum measurement provides a natural mechanism for uncertainty quantification without requiring Bayesian inference machinery .

Abstract ↗ PDF ↗

🟢 Applied

Transferable FB-GNN-MBE Framework for Potential Energy Surfaces: Data-Adaptive Transfer Learning in Deep Learned Many-Body Expansion Theory

💡 This research forecasting edge computing.

FB-GNN-MBE can reproduce first-principles potential energy surfaces for hierarchically structured systems with manageable accuracy, complexity, and interpretability . It outperformed conventional non-FBN-based models and showed high practicality for large-scale molecular simulations .

Abstract ↗ PDF ↗

🟡 Advanced

Iterative Identification Closure: Amplifying Causal Identifiability in Linear SEMs

💡 This research makes more efficient edge computing.

The Half-Trek Criterion (HTC) is the primary graphical tool for determining generic identifiability of causal effect coefficients in linear structural equation models . However, HTC is inherently node-wise: it simultaneously resolves all incoming edges of a node, leaving a gap of "inconclusive" causal effects . We introduce a framework that decouples causal identification into two phases: (1) seed function S_0 that identifies an initial set of edges from any external source of

Abstract ↗ PDF ↗

🟢 Applied

ANTIC: Adaptive Neural Temporal In-situ Compressor

💡 This research explores techniques in machine learning.

The persistent storage requirements for high-resolution, spatiotemporally evolving fields governed by large-scale and high-dimensional partial differential equations (PDEs) have reached the petabyte-to-exabyte scale . To address this bottleneck, we introduce ANTIC (Adaptive Neural Temporal in situ Compressor)

Abstract ↗ PDF ↗

🟢 Applied

XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers

💡 This research distributed machine learning across privacy-preserving AI.

Model poisoning attacks pose a significant security threat to Federated Learning (FL) Most existing model poisoning attacks rely on collusion, requiring adversarial clients to coordinate by exchanging local benign models and synchronizing their poisoned updates . To address this challenge, we introduce and formalize the \textbf{non-collusive attack model .

Abstract ↗ PDF ↗

🟢 Applied

DSVTLA: Deep Swin Vision Transformer-Based Transfer Learning Architecture for Multi-Type Cancer Histopathological Cancer Image Classification

💡 This research proposes a method for computer vision.

In this study, we proposed a deep Swin-Vision Transformer-based transfer learning architecture for robust multi-cancer image classification . The proposed framework integrates a hierarchical Swin Transformer with ResNet50-based convolution features extraction . The model reached 100% test accuracy for lung-colon cancer, segmented leukemia datasets, and up to 99.23% accuracy for breast cancer classification .

Abstract ↗ PDF ↗

🟢 Applied

Continuous Orthogonal Mode Decomposition: Haptic Signal Prediction in Tactile Internet

💡 This research proposes a method for machine learning.

Mode-Domain Architecture (MDA) is a bilateral predictive neural network architecture designed to restore missing signals on both the human and robot sides . MDA utilizes a novel Continuous-Orthogonal Mode Decomposition framework . The model achieves ultra-low inference latency of 0.065 ms, outperforming existing benchmarks .

Abstract ↗ PDF ↗

🟢 Applied

AdaCubic: An Adaptive Cubic Regularization Optimizer for Deep Learning

💡 This research optimizes computer vision.

AdaCubic is an auxiliary optimization problem with cubic constraints that dynamically adjusts the weight of the cubic term in Newton's cubic regularized method . We use Hutchinson's method to approximate the Hessian matrix, thereby reducing computational cost .

Abstract ↗ PDF ↗

🟢 Applied

Multi-task Just Recognizable Difference for Video Coding for Machines: Database, Model, and Coding Application

💡 This research proposes a method for computer vision.

Just Recognizable Difference (JRD) boosts coding efficiency for machine vision through visibility threshold modeling, but is currently limited to a single-task scenario . To address this issue, we propose a Multi-Task JRD dataset and an Attribute-assisted MT-JRD model for Video Coding for Machines (VCM)

Abstract ↗ PDF ↗

🔬

Privacy-Preserving ML

🟢 Applied

Trans-RAG: Query-Centric Vector Transformation for Secure Cross-Organizational Retrieval

💡 This research distributed machine learning across language AI.

Trans-RAG implements a novel vector space language paradigm where each organization's knowledge exists in a mathematically isolated semantic space . At the core lies vector2Trans, a multi-stage transformation technique that enables queries to dynamically "speak" each organizations's vector space "language"

Abstract ↗ PDF ↗

🟢 Applied

DiffHLS: Differential Learning for High-Level Synthesis QoR Prediction with GNNs and LLM Code Embeddings

💡 This research optimizes language AI.

High-Level Synthesis compiles C/C++ into RTL but exploring pragma-driven optimization choices remains expensive because each design point requires time-consuming synthesis . We propose a differential learning framework for HLS Quality-of- Result (QoR) prediction that learns from kernel--design pairs .

Abstract ↗ PDF ↗

🟢 Applied

DeepGuard: Secure Code Generation via Multi-Layer Semantic Aggregation

💡 This research explores techniques in language AI.

Large Language Models for code generation can replicate insecure patterns from their training data . To mitigate this, a common strategy for security hardening is to fine-tune models using supervision derived from the final transformer layer . This design may suffer from a final-layer bottleneck: vulnerability-discriminative cues can be distributed across layers and become less detectable near the output representations optimized for next-token prediction .

Abstract ↗ PDF ↗

🟢 Applied

Event-Driven Temporal Graph Networks for Asynchronous Multi-Agent Cyber Defense in NetForge_RL

💡 This research explores techniques in language AI.

The transition of Multi-Agent Reinforcement Learning (MARL) policies from simulated cyber wargames to operational Security Operations Centers (SOCs) is fundamentally bottlenecked by the Sim2Real gap . NetForge enforces Zero-Trust Network Access (ZTNA) constraints and requires defenders to process NLP-encoded SIEM telemetry .

Abstract ↗ PDF ↗

🟢 Applied

Stochastic-Dimension Frozen Sampled Neural Network for High-Dimensional Gross-Pitaevskii Equations on Unbounded Domains

💡 This research proposes a method for machine learning.

In this paper, we propose a stochastic-dimension frozen sampled neural network for solving a class of high-dimensional Gross-Pitaevskii equations . SD-FSNN is unbiased across all dimensions and its computational cost is independent of the dimension . We randomly sample the hidden weights and biases of the neural network, outperforming gradient-based optimization methods in terms of training time and accuracy .

Abstract ↗ PDF ↗

🟢 Applied

Meta-Learned Basis Adaptation for Parametric Linear PDEs

💡 This research proposes a method for machine learning.

We propose a hybrid physics-informed framework for solving families of parametric linear partial differential equations (PDEs) by combining a meta-learned predictor with a least-squares corrector . The predictor is a shallow task-conditioned model that maps query coordinates and PDE parameters to solution values while internally generating an interpretable, task-adaptive Gaussian basis geometry . This predictor-generated geometry is transferred to a second-stage corrector, which augments it with a

Abstract ↗ PDF ↗

🟡 Advanced

Are Independently Estimated View Uncertainties Comparable? Unified Routing for Trusted Multi-View Classification

💡 This research categorizing computer vision.

Trusted multi-view classification typically relies on view-wise evidential fusion process . Different views often differ in feature space, noise level, and semantic granularity . As a result, the uncertainty used for fusion can be dominated by branch-specific scale bias rather than true sample-level reliability .

Abstract ↗ PDF ↗

🟢 Applied

The causal relation between off-street parking and electric vehicle adoption in Scotland

💡 This research explores techniques in machine learning.

The transition to electric mobility hinges on maximising aggregate adoption while facilitating equitable access . The study examines whether the 'charging divide' between households with and without off-street parking reflects a genuine infrastructure constraint or a by-product of socio-economic disparity .

Abstract ↗ PDF ↗

🟢 Applied

Automated Batch Distillation Process Simulation for a Large Hybrid Dataset for Deep Anomaly Detection

💡 This research automatically finding machine learning.

Anomaly detection (AD) in chemical processes based on deep learning offers significant opportunities but requires large, diverse, and well-annotated training datasets . In a recent work, we introduced a large, fully annotated experimental dataset for batch distillation under normal and anomalous operating conditions . In the present study, we augment this dataset with a corresponding simulation dataset, creating a novel hybrid dataset .

Abstract ↗ PDF ↗

🟢 Applied

CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation

💡 This research protecting data privacy in language AI.

CORA is a post-policy, pre-action safeguarding framework that provides statistical guarantees on harmful executed actions . CORA reformulates safety as selective action execution: we train a Guardian model to estimate action-conditional risk for each proposed step .

Abstract ↗ PDF ↗

🟢 Applied

Synthesizing real-world distributions from high-dimensional Gaussian Noise with Fully Connected Neural Network

💡 This research improves privacy-preserving AI.

Synthetic data can be used in machine learning applications and research . The proposed solution surpasses the state-of-the-art generative methods and achieves reference MMD scores faster than modern deep learning solutions .

Abstract ↗ PDF ↗

🟢 Applied

PDE-regularized Dynamics-informed Diffusion with Uncertainty-aware Filtering for Long-Horizon Dynamics

💡 This research forecasting machine learning.

PDYffusion is a dynamics-informed diffusion framework that integrates PDE-based regularization and uncertainty-aware forecasting for stable long-term prediction . The proposed method consists of two key components: a PDE regularized interpolator and a UKF-based forecaster . The interpolator incorporates a differential operator to enforce physically consistent intermediate states, while the forecaster leverages the Unscented Kalman Filter to explicitly model uncertainty .

Abstract ↗ PDF ↗

🟢 Applied

Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision

💡 This research forecasting computer vision.

Evidence-grounded reasoning requires more than attaching text to a prediction . A model should make decisions that depend on whether the provided evidence supports the target claim . In practice, this often fails because supervision is weak, evidence is only loosely tied to the claim, and evaluation does not test evidence dependence directly . A key contribution is a supervision construction procedure that generates explicit support examples with semantically controlled non-support examples .

Abstract ↗ PDF ↗

🟢 Applied

Toward World Models for Epidemiology

💡 This research explores techniques in machine learning.

World models have emerged as a unifying paradigm for learning latent dynamics, simulating counterfactual futures, and supporting planning under uncertainty . This is because epidemic decision-making requires reasoning about latent disease burden, imperfect surveillance signals and policy-dependent surveillance signals .

Abstract ↗ PDF ↗

🟢 Applied

RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval

💡 This research proposes a method for language AI.

We propose RecaLLM, a set of reasoning language models post-trained to make effective use of long-context information . We observe consistent gains at context windows of up to 128K tokens using training samples of at most 10K tokens .

Abstract ↗ PDF ↗

🟢 Applied

SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning

💡 This research explores techniques in machine learning.

Safety guarantees are a prerequisite to the deployment of reinforcement learning agents in safety-critical tasks . Often, deployment environments exhibit non-stationary dynamics or are subject to changing performance goals, requiring updates to the learned policy . We propose a novel a priori approach to safe policy updates in continual RL .

Abstract ↗ PDF ↗

🟢 Applied

An Open-Source, Open Data Approach to Activity Classification from Triaxial Accelerometry in an Ambulatory Setting

💡 This research speeds up edge computing.

Data were collected from 23 healthy subjects (16 males and seven females) aged between 23 and 62 years using an ambulatory device . Participants followed a standardized activity routine involving five distinct activities: lying, sitting, standing, walking, and jogging .

Abstract ↗ PDF ↗

🟢 Applied

Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories

💡 This research introduces a new approach to computer vision.

Rays as Pixels is a Video Diffusion Model that learns a joint distribution over videos and camera trajectories . We represent each camera as dense ray pixels (raxels) and denoise them jointly with video frames through Decoupled Self-Cross Attention mechanism .

Abstract ↗ PDF ↗

🟢 Applied

Offline Local Search for Online Stochastic Bandits

💡 This research explores techniques in machine learning.

Combinatorial multi-armed bandits provide a fundamental online decision-making environment where a decision-maker interacts with an environment across $T$ time steps . The goal is to minimize regret, defined as the loss compared to the optimal fixed action .

Abstract ↗ PDF ↗

🟢 Applied

NOMAD: Generating Embeddings for Massive Distributed Graphs

💡 This research presents techniques for edge computing.

Successful machine learning on graphs or networks requires embeddings that preserve the graph structure . NOMAD implements proximity-based models proposed in the widely popular LINE algorithm . We propose several practical trade-offs to improve the scalability and communication overheads of irregular and distributed graph embedding methods .

Abstract ↗ PDF ↗

🟢 Applied

Automated Instruction Revision (AIR): A Structured Comparison of Task Adaptation Strategies for LLM

💡 This research optimizes language AI.

Automated Instruction Revision (AIR) is a rule-induction-based method for adapting large language models to downstream tasks . The paper argues that adaptation performance is strongly task-dependent: no single method dominates across all settings . AIR is most promising when task behavior can be captured by compact, interpretable instruction rules .

Abstract ↗ PDF ↗

🟢 Applied

PhysInOne: Visual Physics Learning and Reasoning in One Suite

💡 This research presents techniques for machine learning.

PhysInOne is a large-scale synthetic dataset addressing the critical scarcity of physically-grounded training data for AI systems . It provides 2 million videos across 153,810 dynamic 3D scenes, covering 71 basic physical phenomena .

Abstract ↗ PDF ↗

🟢 Applied

Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

💡 This research explores techniques in machine learning.

Learning-to-Defer routes each input to an expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time . We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting . We then introduce an augmented surrogate that operates on the composite expert--advice action space .

Abstract ↗ PDF ↗

🟡 Advanced

Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

💡 This research presents techniques for machine learning.

We study the population loss landscape of two-layer ReLU networks of the form $k=1/k$ in a realisable teacher-student setting with Gaussian covariates . We show that local minima admit an exact low-dimensional representation in terms of summary statistics . We further establish a direct link with one-pass SGD . This perspective reveals a hierarchical structure of minima: they are typically isolated in the well-specified regime, but become connected by flat directions as

Abstract ↗ PDF ↗

🟢 Applied

Is More Data Worth the Cost? Dataset Scaling Laws in a Tiny Attention-Only Decoder

💡 This research improves language AI.

Training Transformer language models is expensive, as performance typically improves with increasing dataset size and computational budget . We isolate dataset-size effects using a strongly reduced attention-only decoder architecture . We observe smooth performance improvements accompanied by clear diminishing returns .

Abstract ↗ PDF ↗

🔬

Creative AI / Emotion

🟢 Applied

Yes, But Not Always. Generative AI Needs Nuanced Opt-in

💡 This research creating new content with machine learning.

This paper argues that a one-size-fits-all approach to specifying consent for the use of creative works in generative AI is insufficient . Real-world ownership and rights holder structures make the status quo of binary consent with opt-in by default untenable .

Abstract ↗ PDF ↗

🟢 Applied

Persona-E$^2$: A Human-Grounded Dataset for Personality-Shaped Emotional Responses to Textual Events

💡 This research understanding emotions in language AI.

Persona-E$^2$ (Persona-Event2Emotion) is a large-scale dataset grounded in annotated MBTI and Big Five traits to capture reader-based emotional variations across news, social media, and life narratives .

Abstract ↗ PDF ↗

🟢 Applied

Three Modalities, Two Design Probes, One Prototype, and No Vision: Experience-Based Co-Design of a Multi-modal 3D Data Visualization Tool

💡 This research tackles the problem of computer vision.

Three-dimensional (3D) data visualizations, such as surface plots, are vital in STEM fields from biomedical imaging to spectroscopy, yet remain largely inaccessible to blind and low-vision (BLV) people . We conducted an Experience-Based Co-Design with BLV co-designers with expertise in non-visual data representations .

Abstract ↗ PDF ↗

🟢 Applied

GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking

💡 This research optimizes language AI.

Audio large language models (ALLMs) enable rich speech-text interaction, but they also introduce jailbreak vulnerabilities in the audio modality . Authors propose GRM, a utility-aware frequency-selective jailbreak framework . GRM achieves an average Jailbreak Success Rate of 88.46% while providing a better attack-utility trade-off than representative baselines .

Abstract ↗ PDF ↗

🟢 Applied

Camera Artist: A Multi-Agent Framework for Cinematic Language Storytelling Video Generation

💡 This research proposes a method for machine learning.

Camera Artist is a multi-agent framework that models a real-world filmmaking workflow to generate narrative videos with explicit cinematic language . Camera Artist builds upon established agentic pipelines and introduces a dedicated Cinematography Shot Agent .

Abstract ↗ PDF ↗

🟢 Applied

LatentFlowSR: High-Fidelity Audio Super-Resolution via Noise-Robust Latent Flow Matching

💡 This research improves speech processing.

Audio super-resolution aims to recover missing high-frequency details from bandwidth-limited low-resolution audio, thereby improving the naturalness and perceptual quality of the reconstructed signal . LatentFlowSR leverages conditional flow matching (CFM) within a latent representation space .

Abstract ↗ PDF ↗

🟢 Applied

Few-Shot Contrastive Adaptation for Audio Abuse Detection in Low-Resource Indic Languages

💡 This research automatically finding speech processing.

Contrastive Language-Audio Pre-training (CLAP) can support abusive speech detection directly from audio . Abusive speech detection is becoming increasingly important as social media shifts towards voice-based interaction .

Abstract ↗ PDF ↗

🟢 Applied

The Speculative Future of Conversational AI for Neurocognitive Disorder Screening: a Multi-Stakeholder Perspective

💡 This research explores techniques in emotion AI.

Neurocognitive disorders (NCDs) are globally prevalent and require scalable screening methods for proactive management . Prior research has explored the potential of technologies like conversational AI (CAI) to administer NCD screening tests . But challenges remain in designing CAI-based solutions that make routine screening socially acceptable, engaging, and encouraging early medical consultation .

Abstract ↗ PDF ↗

🟢 Applied

DialogueSidon: Recovering Full-Duplex Dialogue Tracks from In-the-Wild Dialogue Audio

💡 This research explores techniques in speech processing.

Most in-the-wild two-speaker dialogue is available only as degraded monaural mixtures . We propose DialogueSidon, a model for joint restoration and separation of degraded mixtures of dialogue audio .

Abstract ↗ PDF ↗

🟢 Applied

Artificial intelligence can persuade people to take political actions

💡 This research explores techniques in computer vision.

A growing body of research has found that AI can produce large persuasive effects on people's attitudes, but whether AI can persuade people to take consequential real-world actions has remained unclear . In two large preregistered experiments, we used conversational AI models to persuade participants on a range of attitudinal and behavioural outcomes, including signing real petitions and donating money to charity . We found sizable AI persuasion effects on these behavioural outcomes .

Abstract ↗ PDF ↗

🟢 Applied

Enhance Comprehension of Over-the-Counter Drug Instructions for the General Public and Medical Professionals through Visualization Design

💡 This research enhances computer vision.

Drug instructions are crucial for guiding the rational use of medication . We conduct a visualization design study to enhance the comprehension of OTC drug instructions . We devise two tailored drug instruction designs for different audience groups through an iterative design process .

Abstract ↗ PDF ↗

🟢 Applied

Demonstrably Informed Consent in Privacy Policy Flows: Evidence from a Randomized Experiment

💡 This research protecting data privacy in privacy-preserving AI.

In most privacy-policy consent flows, agreement is operationalized as a single click at the end of a long, opaque policy document . Recent privacy-law scholarship has argued for a standard of demonstrably informed consent . Authors say pedagogical friction can strengthen evidentiary basis of consent and clarify what it costs in time and burden .

Abstract ↗ PDF ↗

🟢 Applied

Confidence Without Competence in AI-Assisted Knowledge Work

💡 This research explores techniques in language AI.

Large Language Models are widely used by students, yet their tendency to provide fast and complete answers may discourage reflection and foster overconfidence . We examined how alternative LLM interaction designs support deeper thinking without excessively increasing cognitive burden . Future-self explanations imposed higher cognitive workload yet yielded the closest alignment between perceived and actual understanding .

Abstract ↗ PDF ↗

🟢 Applied

Do We Really Need to Approach the Entire Pareto Front in Many-Objective Bayesian Optimisation?

💡 This research optimizes machine learning.

Many-objective optimisation involves optimising problems with more than three objectives . As the number of objectives increases, number of solutions needed to adequately represent the entire Pareto front typically grows substantially . This makes it challenging, if not infeasible, to design a search algorithm capable of effectively exploring the entire . Bayesian optimisation, where sample efficiency is critical, may be more useful to focus on finding a single solution .

Abstract ↗ PDF ↗

🟢 Applied

DDSP-QbE++: Improving Speech Quality for Speech Anonymisation for Atypical Speech

💡 This research explores techniques in machine learning.

Digital Signal Processing (DDSP) pipelines for voice conversion rely on subtractive synthesis . In DDSP-QbE, the excitation is generated via phase accumulation, producing a sawtooth-like waveform whose abrupt discontinuities introduce aliasing artefacts that manifest as buzziness and spectral distortion .

Abstract ↗ PDF ↗

🟢 Applied

Structuring versus Problematizing: How LLM-based Agents Scaffold Learning in Diagnostic Reasoning

💡 This research enhances language AI.

Novices often face cognitive biases such as premature closure and over-reliance on heuristics . Novices struggle to transfer diagnostic strategies to new cases . Scenario-based learning (SBL) enhanced by Learning Analytics (LA) and large language models offers a promising approach .

Abstract ↗ PDF ↗

🟢 Applied

EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers

💡 This research improves machine learning.

EquiformerV3, the third generation of the $SE(3)$-equivariant graph attention Transformer, is designed to advance all three dimensions: efficiency, expressivity, and generality . SwiGLU-$S^2$ activations and smooth-cutoff attention enable accurate modeling of smoothly varying potential energy surfaces .

Abstract ↗ PDF ↗

🟢 Applied

VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning

💡 This research achieves better language AI.

Large Vision Language Models (LVLMs) achieve strong multimodal reasoning but frequently exhibit hallucinations and incorrect responses with high certainty . VL-Calibration decouples confidence into visual and reasoning confidence . We propose token-level advantage reweighting to focus optimization on tokens based on visual certainty .

Abstract ↗ PDF ↗

🟢 Applied

Many Ways to Be Fake: Benchmarking Fake News Detection Under Strategy-Driven AI Generation

💡 This research automatically finding language AI.

Modern fake news arises through human-AI collaboration, where strategic inaccuracies are embedded within otherwise accurate and credible narratives . Mixed-truth cases represent a realistic and consequential threat, yet they remain underrepresented in existing benchmarks . We introduce a synthetic benchmark containing 6,798 fake news articles .

Abstract ↗ PDF ↗

🟢 Applied

VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning

💡 This research explores techniques in language AI.

Visual Retrieval-Augmented Generation (VRAG) empowers Vision-Language Models to retrieve and reason over visually rich documents . The accumulation of visual tokens across retrieved pages dilutes context and causes cognitive overload, leading agents to deviate from their search objective .

Abstract ↗ PDF ↗

🟢 Applied

Strategic Algorithmic Monoculture:Experimental Evidence from Coordination Games

💡 This research explores techniques in language AI.

AI agents increasingly operate in multi-agent environments where outcomes depend on coordination . While LLMs coordinate extremely well on similar actions, they lag behind humans in sustaining heterogeneity when divergence is rewarded .

Abstract ↗ PDF ↗

🟢 Applied

Process Reward Agents for Steering Knowledge-Intensive Reasoning

💡 This research running AI locally on devices for edge computing.

Process Reward Agents (PRA) is a test-time method for providing domain-grounded, online, step-wise rewards to a frozen policy . In contrast to prior retrieval-augmented PRMs, PRA enables search-based decoding to rank and prune candidate trajectories at every generation step . Experiments on multiple medical reasoning benchmarks demonstrate that PRA consistently outperforms strong baselines .

Abstract ↗ PDF ↗

🟢 Applied

SafeMind: A Risk-Aware Differentiable Control Framework for Adaptive and Safe Quadruped Locomotion

💡 This research achieves better edge computing.

SafeMind unifies probabilistic Control Barrier Functions with semantic context understanding and meta-adaptive risk calibration . A semantics-to-constraint encoder modulates safety margins using perceptual or language cues . SafeMind reduces safety violations by 3--10x and energy consumption by 10--15% relative to state-of-the-art CBF, MPC and hybrid RL baselines .

Abstract ↗ PDF ↗

🟢 Applied

Silence and Noise: Self-censorship and Opinion Expression on Social Media

💡 This research explores techniques in edge computing.

Social media users embedded within larger audiences, with lower posting frequency and perceived support, are less likely to express their opinions . Those who do speak often adjust their expressed views to align with perceived group norms .

Abstract ↗ PDF ↗

🟢 Applied

Intent Lenses: Inferring Capture-Time Intent to Transform Opportunistic Photo Captures into Structured Visual Notes

💡 This research explores techniques in language AI.

We introduce Intent Lenses, a conceptual primitive for intent-mediated note generation and sensemaking . We present an interactive system that infers lenses from presentation captures to generate structured visual notes on a spatial canvas . Users can further add, link, and arrange lenses across captures to support exploration .

Abstract ↗ PDF ↗

🔬

Lightweight Systems

🟢 Applied

EdgeFlow: Fast Cold Starts for LLMs on Mobile Devices

💡 This research protecting data privacy in language AI.

EdgeFlow is a mobile LLM inference framework that mitigates the cold start issue by adaptively adjusting the precisions of LLM parameters . EdgeFlow reduces cold-start latency by up to 4.07x compared with three state-of-the-art mobile LLMs .

Abstract ↗ PDF ↗

🟢 Applied

SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs

💡 This research running AI on low-power devices for language AI.

Large Language Model (LLM) inference on edge Neural Processing Units (NPUs) is fundamentally constrained by limited on-chip memory capacity . We propose SHIELD, a lifecycle-aware segmented eDRAM architecture that jointly exploits temporal residency and bit-level sensitivity in bfloat16 (BF16) activations . SHIELD isolates the sign and exponent fields from the mantissa, disables refresh for transient QO mantissas, and applies relaxed refresh to persistent

Abstract ↗ PDF ↗

🟢 Applied

Contextual Chain: Single-State Ledger Design for Mobile/IoT Networks with Frequent Partitions

💡 This research running AI locally on devices for edge computing.

We study a lightweight ledger protocol for intermittent and noisy networks . We evaluate the protocol with a discrete-event simulator under controlled partitions and two network regimes . The main result is that quarantine alone does not materially improve agreement or recovery under noisy conditions .

Abstract ↗ PDF ↗

🟢 Applied

PG-MDP: Profile-Guided Memory Dependence Prediction for Area-Constrained Cores

💡 This research makes more efficient edge computing.

Memory Dependence Prediction (MDP) is a speculative technique to determine which stores, if any, a given load will depend on . Area-constrained cores are increasingly relevant in various applications such as energy-efficient or edge systems . This paper proposes that targeting the predictor working set is as effective as growing the predictor .

Abstract ↗ PDF ↗

🟢 Applied

Administrative Decentralization in Edge-Cloud Multi-Agent for Mobile Automation

💡 This research protecting data privacy in privacy-preserving AI.

AdecPilot employs a Hierarchical Implicit Implicit Termi- nation protocol to enforce deterministic stops and prevent post- completion hallucinations . The source code is available at https://anonymous.4open.science/r/Anonymous_code- B8AB .

Abstract ↗ PDF ↗

🟢 Applied

From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI Inference

💡 This research optimizes machine learning.

RL-driven compiler jointly optimizes ASIC architecture, memory hierarchy, and workload partitioning for AI inference across 3nm to 28nm . The design space is formulated as a single Markov Decision Process with mixed discrete-continuous actions and a unified Power-Performance-Area objective .

Abstract ↗ PDF ↗

🟡 Advanced

Beyond End-to-End: Dynamic Chain Optimization for Private LLM Adaptation on the Edge

💡 This research proposes a method for language AI.

Chain Federated Fine-Tuning (ChainFed) forgoes end-to-end updates in favor of a sequential, layer-by-layer manner . ChainFed trains the initial adapter to convergence, freezes its weights, and then proceeds to the next . This iterative train-and-freeze process forms an optimization chain, gradually enhancing the model's task-specific proficiency . Extensive experiments demonstrate the superiority of ChainFed over existing methods .

Abstract ↗ PDF ↗

🟢 Applied

MATCHA: Efficient Deployment of Deep Neural Networks on Multi-Accelerator Heterogeneous Edge SoCs

💡 This research speeds up machine learning.

MATCHA is a unified DNN deployment framework that generates highly concurrent schedules for parallel, heterogeneous accelerators . It uses constraint programming to optimize L3/L2 memory allocation and scheduling . Pattern matching, tiling, and mapping across individual HW units enables parallel execution .

Abstract ↗ PDF ↗

🟢 Applied

Sensor Placement for Tsunami Early Warning via Large-Scale Bayesian Optimal Experimental Design

💡 This research optimizes machine learning.

Real-time tsunami early warning relies on distributed sensor networks to infer seismic sources and seafloor motion . Optimizing these networks via Bayesian optimal experimental design (OED) is exceptionally challenging for systems governed by hyperbolic partial differential equations . We present a scalable Bayesian OED framework for linear time-invariant systems .

Abstract ↗ PDF ↗

🟢 Applied

The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency

💡 This research optimizes edge computing.

The Hardware Lottery posits that research directions are dictated by available silicon compute platforms . We identify a derivative phenomenon where model architectures are optimized for cloud throughput at the expense of algorithmic efficiency . We argue for decoupling cloud-scale saturation strategies from core architectural design to preserve the viability of single-user, real-time edge intelligence .

Abstract ↗ PDF ↗

🟢 Applied

TRAPTI: Time-Resolved Analysis for SRAM Banking and Power Gating Optimization in Embedded Transformer Inference

💡 This research achieves better computer vision.

Transformer neural networks achieve state-of-the-art accuracy across language and vision tasks, but deployment on embedded hardware is hindered by stringent area, latency, and energy constraints . During inference, performance and efficiency are increasingly dominated by the Key--Value (KV) cache .

Abstract ↗ PDF ↗

🟢 Applied

TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training

💡 This research makes more efficient language AI.

Reference-Oriented Storage (ROS) is a new storage abstraction for RL weight transfer . ROS presents the illusion that certain versions of the model weights are stored and can be fetched on demand . ROS does not physically store any copies of the weights; instead it tracks the workers that hold these weights on GPUs for inference .

Abstract ↗ PDF ↗

🟢 Applied

Memory Wall is not gone: A Critical Outlook on Memory Architecture in Digital Neuromorphic Computing

💡 This research tackles the problem of edge computing.

The rapid advancement of neuromorphic technology aims to address the memory wall challenge inherent in conventional von Neumann architecture . While designed to bring computation closer to memory through distributed architectures, our findings indicate that on-chip memory systems have become significant consumers of area and energy .

Abstract ↗ PDF ↗

🟢 Applied

NL-CPS: Reinforcement Learning-Based Kubernetes Control Plane Placement in Multi-Region Clusters

💡 This research presents techniques for edge computing.

Placement of Kubernetes control-plane nodes is critical to ensuring cluster reliability, scalability, and performance . Existing initialisation procedures typically select control-planes arbitrarily, without considering node resource capacity or network topology .

Abstract ↗ PDF ↗

🟢 Applied

Trilinear Compute-in-Memory Architecture for Energy-Efficient Transformer Acceleration

💡 This research reduces computer vision.

Self-attention in Transformers generates dynamic operands that force conventional Compute-in-Memory accelerators into costly non-volatile memory (NVM) reprogramming cycles, degrading throughput and stressing device endurance . We present TrilinearCIM, a Double-Gate FeFET (DG-FeFET)-based architecture that uses back-gate modulation to realize a three-operand multiply-accumulate primitive .

Abstract ↗ PDF ↗

🟢 Applied

Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start

💡 This research reduces language AI.

Foundry is a template-based CUDA graph context materialization system . It persists both graph topology and execution context during offline processing . Foundry reduces cold-start latency by up to 99%, cutting the initialization time of Qwen3-235B-A22B from 10 minutes to 3.9 seconds .

Abstract ↗ PDF ↗

🟢 Applied

Sustaining Exascale Performance: Lessons from HPL and HPL-MxP on Aurora

💡 This research explores techniques in machine learning.

Sustaining exascale performance in production requires engineering choices and operational practices that emerge only under real deployment constraints and demand coordination across system layers . HPL-MxP reached 11.64EF/s, an 11.5x speedup over FP64 enabled by mixed-precision arithmetic .

Abstract ↗ PDF ↗

🟢 Applied

A 0.5-V Linear Neuromorphic Voltage-to-Spike Encoder Using a Bulk-Driven Transconductor

💡 This research improves machine learning.

This work introduces an ultralow-power voltage-to-spike encoder . The encoder achieves a deviation of less than 5.6 percent from linearity over 0.1-0.4 V input . Fabricated in TSMC 0.18-um CMOS and operating at VDD = 0.5 V with 2-27 nA reference current .

Abstract ↗ PDF ↗

🟢 Applied

From Indiscriminate to Targeted: Efficient RTL Verification via Functionally Key Signal-Driven LLM Assertion Generation

💡 This research reduces language AI.

Assertion-Based Verification (ABV) is key to reducing debugging time . But existing methods pursue indiscriminate verification, aiming for maximal coverage without considering signal criticality . We propose AgileAssert, a key signal-driven assertion generation framework .

Abstract ↗ PDF ↗

🟢 Applied

Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading

💡 This research makes more efficient computer vision.

Advances in GPU compute throughput and memory capacity brings significant opportunities to a wide range of workloads . Multi-Instance GPU (MIG) is a promising approach to improve utilization by partitioning GPU compute and memory resources into fixed-size slices with isolation .

Abstract ↗ PDF ↗

🟢 Applied

City-Scale Visibility Graph Analysis via GPU-Accelerated HyperBall

💡 This research presents techniques for computer vision.

Visibility Graph Analysis (VGA) is a key space syntax method for understanding how spatial configuration shapes human movement . Its reliance on all-pairs BFS computation limits practical application to small study areas . We present a system combining three techniques to scale VGA to city-scale problems .

Abstract ↗ PDF ↗

🟢 Applied

LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows

💡 This research explores techniques in computer vision.

LegoDiffusion outperforms existing diffusion workflow serving systems, sustaining up to 3x higher request rates and tolerating up to 8x higher burst traffic . The system decomposes a workflow into loosely coupled model-execution nodes that can be independently managed .

Abstract ↗ PDF ↗

🟢 Applied

FILCO: Flexible Composing Architecture with Real-Time Reconfigurability for DNN Acceleration

💡 This research achieves better machine learning.

With the development of deep neural network (DNN) enabled applications, achieving high hardware resource efficiency on diverse workloads is non-trivial in heterogeneous computing platforms . Compared with prior works, our design can achieve 1.3x - 5x throughput and hardware efficiency . We also evaluate the FILCO framework on the 7nm AMD Versal VCK190 board .

Abstract ↗ PDF ↗

🟡 Advanced

Parallel Batch-Dynamic Maximal Independent Set

💡 This research makes more efficient edge computing.

We develop the first theoretically-efficient algorithm for maintaining the maximal independent set (MIS) of a graph in the parallel batch-dynamic setting . In this setting, a graph is updated with batches of edge insertions/deletions . For a batch of $b$ updates, our algorithm has $O(b \log^3 n)$ expected work and polylogarithmic depth with high probability .

Abstract ↗ PDF ↗

🟢 Applied

Making Room for AI: Multi-GPU Molecular Dynamics with Deep Potentials in GROMACS

💡 This research faster predictions in edge computing.

GROMACS is a de-facto standard for classical Molecular Dynamics (MD) The rise of AI-driven interatomic potentials that pursue near-quantum accuracy at MD throughput now poses a significant challenge: embedding neural-network inference into multi-GPU simulations retaining high-performance .

Abstract ↗ PDF ↗

🔬

Offline-First / Local AI

🟢 Applied

Drift-Aware Online Dynamic Learning for Nonstationary Multivariate Time Series: Application to Sintering Quality Prediction

💡 This research forecasting computer vision.

Drift-Aware Multi-Scale Dynamic Learning (DA-MSDL) framework is proposed to maintain robust multi-output predictive performance via online adaptive mechanisms on nonstationary data streams . The framework employs a multi-scale bi-branch convolutional network as its backbone to disentangle local fluctuations from long-term trends .

Abstract ↗ PDF ↗

🟡 Advanced

Distributed Online Convex Optimization with Compressed Communication: Optimal Regret and Applications

💡 This research optimizes machine learning.

Distributed online convex optimization (D-OCO) is a powerful paradigm for modeling distributed scenarios with streaming data . However, the communication cost between local learners and the central server is substantial in large-scale applications .

Abstract ↗ PDF ↗

🟢 Applied

Towards Lifelong Aerial Autonomy: Geometric Memory Management for Continual Visual Place Recognition in Dynamic Environments

💡 This research explores techniques in computer vision.

Robust geo-localization in changing environmental conditions is critical for long-term aerial autonomy . Existing continual learning methods often fail here because geographic features exhibit severe intra-class variations . To respect strict onboard storage constraints, our pipeline decouples geographic knowledge into static satellite anchors and a dynamic experience replay buffer .

Abstract ↗ PDF ↗

🟢 Applied

Nexus: Same Pretraining Loss, Better Downstream Generalization via Common Minima

💡 This research running AI locally on devices for language AI.

Pretraining is the cornerstone of Large Language Models, dominating the vast majority of computational budget and data to serve as the primary engine for their capabilities . We hypothesize that the geometric "closeness" of task-specific minima is intrinsically linked to downstream generalization . We propose the Nexus optimizer, which encourages the closeness of these minima by maximizing gradient similarity during optimization . Nexus reduces the out-of-distribution loss by 0.012 and yields up to a 15

Abstract ↗ PDF ↗

🟢 Applied

Temporal Patch Shuffle (TPS): Leveraging Patch-Level Shuffling to Boost Generalization and Robustness in Time Series Forecasting

💡 This research improves machine learning.

Temporal Patch Shuffle (TPS) is a simple and model-agnostic data augmentation method for forecasting . It extracts overlapping temporal patches, selectively shuffles a subset of patches using variance-based ordering as a conservative heuristic . This design increases sample diversity while preserving forecast-consistent local temporal structure .

Abstract ↗ PDF ↗

🟢 Applied

Feature-Label Modal Alignment for Robust Partial Multi-Label Learning

💡 This research tackles the problem of machine learning.

In partial multi-label learning, each instance is associated with a set of candidate labels containing both ground-truth and noisy labels . The presence of noisy labels disrupts the correspondence between features and labels, degrading classification performance . We propose a novel PML method based on feature-label modal alignment (PML-MA)

Abstract ↗ PDF ↗

🟢 Applied

Plasticity-Enhanced Multi-Agent Mixture of Experts for Dynamic Objective Adaptation in UAVs-Assisted Emergency Communication Networks

💡 This research explores techniques in machine learning.

PE-MAMoE equips each UAV with a sparsely gated mixture of experts actor whose router selects a single specialist per step . Phase Controller injects brief, expert-only stochastic perturbations after phase switches, resets the action log-standard-deviation, entropy and learning rate, and schedules the router temperature, all to re-plasticize the policy without destabilizing safe behaviors .

Abstract ↗ PDF ↗

🟢 Applied

Biologically-Grounded Multi-Encoder Architectures as Developability Oracles for Antibody Design

💡 This research proposes a method for language AI.

CrossAbSense is a framework of property-specific neural oracles that combine frozen protein language model encoders with configurable attention decoders . On the GDPa1 benchmark of 242 therapeutic IgGs, our oracles achieve notable improvements of 12--20\% over established baselines on three of five developability assays .

Abstract ↗ PDF ↗

🟢 Applied

Bringing Clustering to MLL: Weakly-Supervised Clustering for Partial Multi-Label Learning

💡 This research explores techniques in computer vision.

Label noise in multi-label learning (MLL) poses significant challenges for model training . We propose a novel weakly-supervised clustering approach for PML . WSC-PML employs a three-stage process: initial prototype learning from noisy labels, adaptive confidence-based weak supervision construction and joint optimization via iterative clustering refinement .

Abstract ↗ PDF ↗

🟢 Applied

Hierarchical Flow Decomposition for Turning Movement Prediction at Signalized Intersections

💡 This research proposes a method for machine learning.

HFD-TM (Hierarchical Flow-Decomposition for Turning Movement Prediction) is a deep learning framework that predicts turning movements by first forecasting corridor through-movements and then expanding these predictions to individual turning streams . The design is motivated by empirical traffic structure, where corridor flows account for 65.1% of total volume, exhibit lower volatility than turning movements .

Abstract ↗ PDF ↗

🟢 Applied

Stability Enhanced Gaussian Process Variational Autoencoders

💡 This research enhances machine learning.

A novel stability-enhanced Gaussian process variational autoencoder (SEGP-VAE) is proposed for indirectly training a low-dimensional linear time invariant (LTI) system, using high-dimensional video data . The mean and covariance function of the novel SEGP prior are derived from the definition of an LTI system .

Abstract ↗ PDF ↗

🟢 Applied

Online Intention Prediction via Control-Informed Learning

💡 This research presents techniques for machine learning.

This paper presents an online intention prediction framework for estimating the goal state of autonomous systems in real time . The problem is formulated as an inverse optimal control / inverse reinforcement learning task, with intention treated as a parameter in the objective .

Abstract ↗ PDF ↗

🟢 Applied

Natural Riemannian gradient for learning functional tensor networks

💡 This research optimizes machine learning.

We consider machine learning tasks with low-rank functional tree tensor networks (TTN) as the learning model . We propose a natural Riemannian gradient descent type approach applicable to arbitrary losses which is based on the natural gradient by Amari .

Abstract ↗ PDF ↗

🟢 Applied

Beyond Segmentation: Structurally Informed Facade Parsing from Imperfect Images

💡 This research tackles the problem of machine learning.

Standard object detectors typically treat architectural elements independently . We address this limitation by augmenting the YOLOv8 training objective with a custom lightweight alignment loss . This regularization encourages grid-consistent arrangements of bounding boxes during training .

Abstract ↗ PDF ↗

🟢 Applied

Statistical Properties of the King Wen Sequence: An Anti-Habituation Structure That Does Not Improve Neural Network Training

💡 This research presents techniques for machine learning.

The King Wen sequence of the I-Ching (c. 1000 BC) orders 64 hexagrams in a pattern that has puzzled scholars for three millennia . We present a rigorous statistical characterization of this ordering using Monte Carlo permutation analysis against 100,000 random baselines . We find that the sequence has four statistically significant properties: higher-than-random transition distance, negative lag-1 autocorrelation, yang-balanced groups of four, and asymmetric within-pair

Abstract ↗ PDF ↗

🟡 Advanced

A Predictive View on Streaming Hidden Markov Models

💡 This research optimizes machine learning.

We develop a predictive-first optimisation framework for streaming hidden Markov models . We assume access to regime-specific predictive models whose parameters are learned online while maintaining a fixed transition prior over regimes . Our objective is to sequentially identify latent regimes while maintaining accurate step-ahead predictive distributions .

Abstract ↗ PDF ↗

🟢 Applied

On the Role of DAG topology in Energy-Aware Cloud Scheduling : A GNN-Based Deep Reinforcement Learning Approach

💡 This research explores techniques in machine learning.

Cloud providers must assign heterogeneous compute resources to workflow DAGs while balancing competing objectives such as completion time, cost, and energy consumption . We identify specific out-of-distribution conditions under which GNN-based deep reinforcement learning schedulers fail . We demonstrate that performance degradation stems from structural mismatches between training and deployment environments .

Abstract ↗ PDF ↗

🟢 Applied

Do LLMs Follow Their Own Rules? A Reflexive Audit of Self-Stated Safety Policies

💡 This research explores techniques in language AI.

Existing benchmarks evaluate models against external standards but do not measure whether models understand and enforce their own stated boundaries . We introduce the Symbolic-Neural Consistency Audit (SNCA), a framework that extracts a model's self-stated safety rules via structured prompts, formalizes them as typed predicates .

Abstract ↗ PDF ↗

🟡 Advanced

MixFlow: Mixed Source Distributions Improve Rectified Flows

💡 This research creating new content with computer vision.

MixFlow trains a flow model on linear mixtures of a fixed unconditional distribution and a $κ\texttt{-FC$-based distribution . This simple mixture improves the alignment between the source and data, provides better generation quality with less required sampling steps, and accelerates training convergence considerably .

Abstract ↗ PDF ↗

🟡 Advanced

Generalization and Scaling Laws for Mixture-of-Experts Transformers

💡 This research explores techniques in machine learning.

We develop a theory of generalization and scaling for Mixture-of-Experts (MoE) Transformers . By conditioning on fixed routing patterns and union-bounding across them, we derive a sup-norm covering-number bound whose metric entropy scales with the active parameter budget and incurs a MoE-specific routing overhead .

Abstract ↗ PDF ↗

🟢 Applied

Truncated Rectified Flow Policy for Reinforcement Learning with One-Step Sampling

💡 This research explores techniques in machine learning.

Maximum entropy reinforcement learning (MaxEnt RL) has become a standard framework for sequential decision making . MaxEnt RL's standard Gaussian policy parameterization is inherently unimodal, limiting its ability to model complex multimodal action distributions . We propose Truncated Rectified Rectified Flow Policy (TRFP), a framework built on a hybrid deterministic-stochastic architecture .

Abstract ↗ PDF ↗

🟢 Applied

A fast and Generic Energy-Shifting Transformer for Hybrid Monte Carlo Radiotherapy Calculation

💡 This research speeds up machine learning.

We introduce a novel learning framework for accelerated Monte Carlo (MC) dose calculation termed Energy-Shifting . This approach leverages deep learning to synthesize 6 MV TrueBeam Linear Accelerator (LINAC) dose distributions directly from monoenergetic inputs under identical beam configurations . We propose a novel 3D architecture termed TransUNetSE3D, featuring Transformer blocks for global context and Residual Squeeze-and-Excitation (SE) modules for adaptive channel-

Abstract ↗ PDF ↗

🟢 Applied

Score-Driven Rating System for Sports

💡 This research proposes a method for machine learning.

This paper introduces a score-driven rating system that employs the score, i.e. the gradient of the log-likelihood, as the updating mechanism for player and team ratings . The proposed framework extends beyond simple win/loss game outcomes and accommodates a wide range of game results, such as point differences, win/draw/loss outcomes .

Abstract ↗ PDF ↗

🟢 Applied

Identifying Causal Effects Using a Single Proxy Variable

💡 This research explores techniques in machine learning.

Unobserved confounding is a key challenge when estimating causal effects from a treatment on an outcome in scientific applications . We assume that we observe a single, potentially multi-dimensional proxy variable of the unobserved confounder . We develop a neural network based estimation framework, SPICE-Net, to estimate causal effects .

Abstract ↗ PDF ↗

🟢 Applied

FIRE-CIR: Fine-grained Reasoning for Composed Fashion Image Retrieval

💡 This research achieves better language AI.

Composed image retrieval (CIR) aims to retrieve a target image that depicts a reference image modified by a textual description . Recent vision-language models (VLMs) achieve promising CIR performance by embedding images and text into a shared space . Instead of relying solely on embedding similarity, FIRE-CIR performs question-driven visual reasoning .

Abstract ↗ PDF ↗