arXiv Research Digest

May 04, 2026 • 125 papers across 5 interests

🔬

Efficient ML / Edge AI

🟢 Applied

Make Your LVLM KV Cache More Lightweight

💡 This research enhances language AI.

Key-Value cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference . LightKV employs cross-modality message passing to aggregate informative messages across vision tokens and progressively compress them during prefill .

Abstract ↗ PDF ↗

🟢 Applied

Quantum Gradient-Based Approach for Edge and Corner Detection Using Sobel Kernels

💡 This research running AI locally on devices for computer vision.

Edge detection refers to identifying points in digital images where intensity changes sharply, indicating object boundaries or structural features . Corners are locations where gray-level intensity changes abruptly in multiple directions and are widely used in feature extraction, object tracking and 3D modeling .

Abstract ↗ PDF ↗

🟡 Advanced

Budget Constraints as Riemannian Manifolds

💡 This research explores techniques in language AI.

Assigning one of K options to each of N groups under a total cost budget is a recurring problem in machine learning . The objective (model loss) depends jointly on all assignments and does not decompose across groups . Evolutionary search evaluates the actual loss but lacks gradient information . We propose Riemannian Constrained Optimization (RCO) augments a standard Adam update with tangent projection .

Abstract ↗ PDF ↗

🟢 Applied

Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

💡 This research enhances language AI.

Decoupled Test-time Synthesis (DoTS) allows SFT and RLVR checkpoints to be trained independently and synthesizes their capabilities only at inference time via task vector arithmetic . DOTS uses selective sparsification with norm-preserving rescaling to reduce interference .

Abstract ↗ PDF ↗

🟡 Advanced

Randomized Subspace Nesterov Accelerated Gradient

💡 This research reduces machine learning.

Randomized-subspace methods reduce the cost of first-order optimization by using only low-dimensional projected-gradient information . The key technical ingredient is a three-sequence formulation tailored to matrix smoothness . The resulting theory establishes accelerated oracle-complexity guarantees .

Abstract ↗ PDF ↗

🟢 Applied

FedKPer: Tackling Generalization and Personalization in Medical Federated Learning via Knowledge Personalization

💡 This research distributed machine learning across privacy-preserving AI.

Federated learning (FL) holds great potential for medical applications . However, statistical heterogeneity across healthcare institutions poses a major challenge for FL . We introduce FedKPer, which introduces knowledge personalization into the training stage of each local device . Afterwards, generalization is considered via the global model aggregation process .

Abstract ↗ PDF ↗

🟢 Applied

Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

💡 This research achieves better computer vision.

Graphical User Interface (GUI) grounding maps natural language instructions to visual coordinates of target elements . Recent reinforcement learning methods (e.g., GRPO) have achieved strong performance, but they rely on expensive multiple rollouts and suffer from sparse signals on hard samples . On-policy self-distillation (OPSD) provides dense token-level supervision from a single rollout . In this paper, we present the first OPSD framework tailored for GUI grounding .

Abstract ↗ PDF ↗

🟢 Applied

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

💡 This research explores techniques in language AI.

Persistent Visual Memory (PVM) is a lightweight learnable module designed to ensure sustained, on-demand visual perception . PVM establishes a distance-agnostic retrieval pathway that directly provides visual embeddings for precise visual perception, thereby structurally mitigating the signal suppression inherent to deep generation .

Abstract ↗ PDF ↗

🟢 Applied

When RAG Chatbots Expose Their Backend: An Anonymized Case Study of Privacy and Security Risks in Patient-Facing Medical AI

💡 This research protecting data privacy in language AI.

Patient-facing medical chatbots based on retrieval-augmented generation (RAG) are increasingly promoted to deliver accessible, grounded health information . AI-assisted development lowers the barrier to building them but they still demand rigorous security, privacy, and governance controls .

Abstract ↗ PDF ↗

🟡 Advanced

Evaluating the Architectural Reasoning Capabilities of LLM Provers via the Obfuscated Natural Number Game

💡 This research achieves better language AI.

Large Language Models have achieved notable success on formal mathematics benchmarks such as MiniF2F . It remains unclear whether these results stem from genuine logical reasoning or semantic pattern matching against pre-training data . This paper identifies Architectural Reasoning as the necessary ability for future automated theorem discovery AI .

Abstract ↗ PDF ↗

🟢 Applied

SC-Taxo: Hierarchical Taxonomy Generation under Semantic Consistency Constraints using Large Language Models

💡 This research makes more efficient language AI.

Scientific literature is expanding at an unprecedented pace, making it increasingly challenging to efficiently organize and access domain knowledge . A high-quality scientific taxonomy offers a structured and hierarchical representation of a research field, facilitating literature exploration and topic navigation . We propose a semantic-consistent taxonomy generation (SC-Taxo) framework that leverages large language models with hierarchy-aware refinement stages to ensure semantic consistency .

Abstract ↗ PDF ↗

🟢 Applied

Faithful Extreme Image Rescaling with Learnable Reversible Transformation and Semantic Priors

💡 This research proposes a method for computer vision.

Most extreme rescaling methods struggle to preserve semantically consistent structures and produce realistic details . To alleviate the above problems, we propose FaithEIR, a diffusion-based framework for extreme image rescaling .

Abstract ↗ PDF ↗

🟢 Applied

Quantum Interval Bound Propagation for Certified Training of Quantum Neural Networks

💡 This research makes more efficient machine learning.

Quantum machine learning is a promising field for efficiently learning features of a dataset to perform a specified task, such as classification . Quantum interval bound propagation (QIBP) is a popular certified training method in classical machine learning .

Abstract ↗ PDF ↗

🟢 Applied

Position: agentic AI orchestration should be Bayes-consistent

💡 This research explores techniques in language AI.

Bayesian decision theory provides a framework for agentic systems that can help to maintain beliefs over task-relevant latent quantities . This paper articulates practical properties for Bayesian control that fit modern agentic AI systems and human-AI collaboration .

Abstract ↗ PDF ↗

🟢 Applied

EASE: Federated Multimodal Unlearning via Entanglement-Aware Anchor Closure

💡 This research distributed machine learning across computer vision.

Federated Multimodal Learning (FML) trains multimodal models across decentralized clients while keeping their image-text pairs private . But joint embedding training entangles forgotten knowledge across both modalities and client gradient subspaces, hindering federated unlearning . We present EASE, an Entanglement-Aware Subspace Excision framework that closes all three anchor channels under unified design .

Abstract ↗ PDF ↗

🟢 Applied

PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning

💡 This research faster predictions in computer vision.

PhysEdit introduces two inference-time modules that compose without retraining the backbone . PhysEdit delivers a 1.18x wall-clock speedup (64.3s vs. 76.1s per sample) on the full 737-case ImgEdit Basic-Edit Suite .

Abstract ↗ PDF ↗

🟢 Applied

Adaptive Querying with AI Persona Priors

💡 This research explores techniques in language AI.

We study adaptive querying for learning user-dependent quantities of interest, such as responses to held-out items and psychometric indicators, within tight question budgets . We introduce a persona-induced latent variable model that represents a user's state through membership in a finite dictionary of AI personas . This yields expressive priors with closed-form posterior updates and efficient finite-mixture predictions .

Abstract ↗ PDF ↗

🟢 Applied

ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models

💡 This research explores techniques in language AI.

ML-Bench is a policy-grounded multilingual safety benchmark covering 14 languages . ML-Guard is a Diffusion Large Language Model (dLLM)-based guardrail model that supports multilingual judgment and policy-conditioned compliance assessment .

Abstract ↗ PDF ↗

🟢 Applied

Static and Dynamic Graph Alignment Network for Temporal Video Grounding

💡 This research enhances computer vision.

Temporal Video Grounding (TVG) aims to localize temporal moments in an untrimmed video that semantically correspond to given natural language queries . Graph Convolutional Networks (GCN) have been widely adopted in TVG to model temporal relations among video clips and enhance contextual reasoning by constructing clip-level graphs .

Abstract ↗ PDF ↗

🟢 Applied

InpaintSLat: Inpainting Structured 3D Latents via Initial Noise Optimization

💡 This research optimizes machine learning.

We present a training-free approach for controllable 3D inpainting based on initial noise optimization . The underlying geometric structure is established during the early stages of the diffusion process and exhibits high sensitivity to the initial noise .

Abstract ↗ PDF ↗

🟢 Applied

Affordance Agent Harness: Verification-Gated Skill Orchestration

💡 This research automatically finding computer vision.

Affordance grounding requires identifying where and how an agent should interact in open-world scenes, where actionable regions are often small, occluded, reflective, and visually ambiguous . Recent systems combine multiple skills (e.g., detection, segmentation, interaction-imagination) yet most orchestrate them with fixed pipelines that are poorly matched to per-instance difficulty .

Abstract ↗ PDF ↗

🟢 Applied

PEACE: Cross-modal Enhanced Pediatric-Adult ECG Alignment for Robust Pediatric Diagnosis

💡 This research enhances computer vision.

PEACE is a structured cross-modal alignment framework for adult-to-pediatric ECG transfer . PEACE integrates tri-axial clinical semantic decomposition, label-query feature extraction, curriculum-gated optimization to align adult ECG representations with pediatric diagnostic targets . On ZZU-pECG, PEACE achieves 59.39%, 79.03%, and 90.89% AUC on the shared PTB-XL label space .

Abstract ↗ PDF ↗

🟢 Applied

Learning Multimodal Energy-Based Model with Multimodal Variational Auto-Encoder via MCMC Revision

💡 This research creating new content with machine learning.

Energy-based models (EBMs) are well-suited to capture complex dependencies in multimodal data . Multimodal VAEs have made progress in capturing such inter-modal dependencies by introducing a shared latent generator and a joint inference model . We present a learning framework that effectively interweaves their updates with corresponding MCMC refinements in both the data and latent spaces .

Abstract ↗ PDF ↗

🟢 Applied

Affinity Is Not Enough: Recovering the Free Energy Principle in Mixture-of-Experts

💡 This research explores techniques in machine learning.

Sparse MoE routing fails at domain transitions, where the current token belongs to one distribution and the next to another . The mechanisms draw from Friston's Free Energy Principle and use LIF dynamics from spiking neural networks . In a controlled experiment (4 experts, 5 seeds), standard affinity routing assigns only 0.006 +/- 0.001 probability to the correct expert at the transition .

Abstract ↗ PDF ↗

🟢 Applied

Posterior Augmented Flow Matching

💡 This research explores techniques in computer vision.

Posterior-Augmented Flow Matching (PAFM) replaces single-target supervision with an expectation over an approximate posterior of valid target completions for a given intermediate state and condition . PAFM improves over FM by up to 3.4 FID50K across different model scales (SiT-B/2 and SiT-XL/2) and different architectures .

Abstract ↗ PDF ↗

🔬

Privacy-Preserving ML

🟡 Advanced

Scaling Federated Linear Contextual Bandits via Sketching

💡 This research distributed machine learning across privacy-preserving AI.

Federated Sketch Contextual Linear Bandits (FSCLB) uses SVD to indirectly obtain the determinant required for communication . FSCLB significantly reduces computational and communication costs by over 90 \% while sacrificing only a negligible amount of cumulative reward .

Abstract ↗ PDF ↗

🟢 Applied

Federated Learning with Hypergradient-based Online Update of Aggregation Weights

💡 This research proposes a method for privacy-preserving AI.

Federated learning using mobile and Internet of Things devices requires high adaptability to varying communication environments . FedHAW (Federated Learning with Hypergradient-based update of Aggregation Weights) implements online updates of aggregation weights .

Abstract ↗ PDF ↗

🟢 Applied

Defense against Poisoning Attacks under Shuffle-DP

💡 This research protecting data privacy in privacy-preserving AI.

Differential Privacy (DP) has become the gold standard for protecting individual privacy in data analytics . The shuffle-DP model has attracted significant attention from both academia and industry due to its favorable balance between privacy and utility . In real-world scenarios, adversarial users can exploit this vulnerability through poisoning attacks .

Abstract ↗ PDF ↗

🟢 Applied

Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

💡 This research proposes a method for privacy-preserving AI.

We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF) We show that $K$-Shapley value is a unique solution concept that satisfies Symmetry, Linearity, Null player, and efficiency properties .

Abstract ↗ PDF ↗

🟢 Applied

Unlearning Offline Stochastic Multi-Armed Bandits

💡 This research protecting data privacy in privacy-preserving AI.

Machine unlearning aims to unlearn data points from a learned model, offering a principled way to process data-deletion requests and mitigate privacy risks without full retraining . We conduct a systematic study of both single- and multi-source unlearning scenarios .

Abstract ↗ PDF ↗

🟡 Advanced

Zero-Knowledge Model Checking

💡 This research explores techniques in edge computing.

Method combines deductive approach to model checking to obtain a formal certificate of correctness for the system, with zero-knowledge proofs to convince an external verifier that the system -- kept secret -- complies with its specification of correctness -- made public .

Abstract ↗ PDF ↗

🟢 Applied

HyCOP: Hybrid Composition Operators for Interpretable Learning of PDEs

💡 This research explores techniques in machine learning.

HyCOP learns parametric PDE solution operators by composing simple modules (advection, diffusion, learned closures) in a query-conditioned way . Modules may be numerical sub-solvers or learned components, enabling hybrid surrogates evaluated at arbitrary query times without autoregressive rollout .

Abstract ↗ PDF ↗

🟢 Applied

Generating Statistical Charts with Validation-Driven LLM Workflows

💡 This research explores techniques in language AI.

A structured LLM-based workflow decomposes chart generation into dataset screening, plot proposal, code synthesis, rendering, validation-driven refinement, description generation, and question-answer generation . It treats chart generation as an inspectable process rather than a one-shot prompt-to-code task . The results show that chart-syntax questions are nearly saturated, while value extraction, comparison and reasoning remain more challenging .

Abstract ↗ PDF ↗

🟢 Applied

RunAgent: Interpreting Natural-Language Plans with Constraint-Guided Execution

💡 This research proposes a method for language AI.

RunAgent is a multi-agent plan execution platform that interprets natural-language plans while enforcing stepwise execution through constraints and rubrics . RunAgent bridges the expressiveness of natural language with the determinism of programming via an agentic language . Evaluations on Natural-plan and SciBench Datasets demonstrate that RunAgent outperforms baseline LLMs .

Abstract ↗ PDF ↗

🟢 Applied

Repurposing Image Diffusion Models for Adversarial Synthetic Structured Data: A Case Study of Ground Truth Drift

💡 This research explores techniques in computer vision.

Public image diffusion models are now powerful enough that an attacker without the resources to train a tabular-specific generator may repurpose one off the shelf . An attacker succeeds with synthetic evidence by thinking like the machine that will receive it . The more the attacker succeeds, the more they can induce ground truth drift .

Abstract ↗ PDF ↗

🟢 Applied

SAVGO: Learning State-Action Value Geometry with Cosine Similarity for Continuous Control

💡 This research improves machine learning.

State-Action Value Geometry Optimization (SAVGO) incorporates value-based similarity into policy updates . SAVGO learns a joint state-action embedding space in which pairs with similar action-value estimates exhibit high cosine similarity .

Abstract ↗ PDF ↗

🟢 Applied

Observable Performance Does Not Fully Reflect System Organization: A Multi-Level Analysis of Gait Dynamics Under Occlusal Constraint

💡 This research explores techniques in machine learning.

In biomechanical systems, observable performance is often used as a proxy for underlying system organization . In this study, the vertical dimension of occlusion (VDO) is considered as a constraint applied to an adaptive neuromechanical system . A single-case design in a patient with Parkinson's disease allows an intra-individual analysis across repeated conditions .

Abstract ↗ PDF ↗

🟢 Applied

Learning the Helmholtz equation operator with DeepONet for non-parametric 2D geometries

💡 This research explores techniques in machine learning.

This paper deals with solving the Helmholtz equation on non-parametric domains . It uses a physics-informed neural operator network based on the DeepONet framework . This approach enables the encoding of arbitrary geometries, whether they are parameterized or not .

Abstract ↗ PDF ↗

🟢 Applied

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

💡 This research explores techniques in language AI.

Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling . Research on application of RMs in code generation has been comparatively sparse, with existing work largely focusing on execution feedback .

Abstract ↗ PDF ↗

🟢 Applied

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

💡 This research proposes a method for machine learning.

Monte Carlo Tree Search scales poorly in cooperative multi-agent domains . Expansion must consider an exponentially large set of joint actions, limiting exploration under realistic search budgets . We propose NonZero, a proposal rule that keeps multi-agents MCTS tractable .

Abstract ↗ PDF ↗

🟢 Applied

Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems

💡 This research makes more efficient language AI.

ASPO is self-adaptive multi-agent security pattern selection that integrates Large Language Model (LLM)-based reasoning with deterministic enforcement within a MAPE-K control loop . ASPO explicitly separates stochastic decision generation from execution: LLM agents propose candidate mitigation portfolios, while a deterministic optimisation core enforces closed-world action integrity .

Abstract ↗ PDF ↗

🟢 Applied

Temporal Data Requirement for Predicting Unplanned Hospital Readmissions

💡 This research explores techniques in machine learning.

The optimal time window for unstructured clinical notes is significantly shorter than for structured data . Maximum predictive performance was achieved using notes from just three to six months prior to surgery . Performance using structured data improved as the time window lengthened, but strictly plateaued after twelve months .

Abstract ↗ PDF ↗

🟢 Applied

Weisfeiler Lehman Test on Combinatorial Complexes: Generalized Expressive Power of Topological Neural Networks

💡 This research explores techniques in machine learning.

Combinatorial complexes have unified set-based (e.g., graphs, hypergraphs) and part-whole structures into a common topological framework . Existing topological neural networks and Weisfeiler-Lehman variants remain fragmented, lacking a unified theoretical foundation for topological deep learning .

Abstract ↗ PDF ↗

🔴 Theory-Heavy

Decentralized Proximal Stochastic Gradient Langevin Dynamics

💡 This research proposes a method for machine learning.

Decentralized Proximal Stochastic Gradient Langevin Dynamics (DE-PSGLD) is a decentralized Markov chain Monte Carlo algorithm for sampling from a log-concave probability distribution constrained to a convex domain . Constraints are enforced through a shared proximal regularization based on the Moreau-Yosida envelope, enabling unconstrained updates while preserving consistency with the target constrained posterior .

Abstract ↗ PDF ↗

🟢 Applied

Aitchison Embeddings for Learning Compositional Graph Representations

💡 This research presents techniques for edge computing.

Many networks naturally admit a role-mixture view, where nodes are best described as mixtures over latent archetypal factors . We propose a compositional graph embedding framework grounded in Aitchison geometry . Nodes are represented as simplex-valued compositions and embedded via isometric log-ratio coordinates .

Abstract ↗ PDF ↗

🟢 Applied

Deep Kernel Learning for Stratifying Glaucoma Trajectories

💡 This research introduces a new approach to computer vision.

Clinicians need tools to identify patients at high risk of progression from sparse and irregularly-sampled electronic health records . We propose a novel deep kernel learning (DKL) architecture that leverages a Gaussian Process (GP) backend .

Abstract ↗ PDF ↗

🟢 Applied

STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack

💡 This research explores techniques in language AI.

Red-teaming Vision-Language Models is essential for identifying vulnerabilities where adversarial image-text inputs trigger toxic outputs . Existing approaches treat image generation as a black box, leaving open the question of when and how toxic semantics emerge during multi-step synthesis . We introduce STARE, a hierarchical reinforcement learning framework that treats the denoising trajectory itself as the attack surface .

Abstract ↗ PDF ↗

🟡 Advanced

Augmented Lagrangian Multiplier Network for State-wise Safety in Reinforcement Learning

💡 This research explores techniques in machine learning.

Safety is a primary challenge in real-world reinforcement learning (RL) Formulating safety requirements as state-wise constraints has become a prominent paradigm . Existing stabilization techniques are designed for scalar multipliers, which are inadequate for state-dependent multiplier networks . To address this challenge, we propose an augmented Lagrangian multiplier network (ALaM) framework .

Abstract ↗ PDF ↗

🟢 Applied

Spiking Sequence Machines and Transformers

💡 This research reduces machine learning.

Sequence learning reduces to similarity-based retrieval over a temporally indexed representation space . We formalise a Phase-Latency Isomorphism showing that sinusoidal positional phase and spike timing are linearly related . Time, phase, and rank are three instantiations of the same computational primitive .

Abstract ↗ PDF ↗

🟢 Applied

Reinforcement Learning with Markov Risk Measures and Multipattern Risk Approximation

💡 This research explores techniques in machine learning.

For a risk-averse finite-horizon Markov Decision Problem, we introduce a special class of Markov coherent risk measures, called mini-batch measures . We also define the class of multipattern risk-Averse problems that generalizes the classes of linear systems . We propose an economical version of the $Q$-learning method that streamlines the policy evaluation (backward) step .

Abstract ↗ PDF ↗

🔬

Creative AI / Emotion

🟢 Applied

Prop-Chromeleon: Adaptive Haptic Props in Mixed Reality through Generative Artificial Intelligence

💡 This research creating new content with computer vision.

Prop-Chromeleon is a MR system based on generative artificial intelligence (AI) that dynamically transforms everyday objects into adaptive passive haptic props through user-provided text prompts .

Abstract ↗ PDF ↗

🟢 Applied

GaMMA: Towards Joint Global-Temporal Music Understanding in Large Multimodal Models

💡 This research achieves better speech processing.

GaMMA is a state-of-the-art (SoTA) large multimodal model (LMM) designed to achieve comprehensive musical content understanding . It inherits the streamlined encoder-decoder design of LLaVA, enabling effective cross-modal learning between music and language . Our approach combines carefully curated datasets at scale with progressive training pipeline .

Abstract ↗ PDF ↗

🟢 Applied

Towards Improving Speaker Distance Estimation through Generative Impulse Response Augmentation

💡 This research improves machine learning.

Room Acoustics and Speaker Distance Estimation (SDE) Challenge at ICASSP 2025 explores effectiveness of augmented room impulse response (RIR) data for improving SDE model performance .

Abstract ↗ PDF ↗

🟢 Applied

The impact of coercive, normative, and mimetic Stress on Chinese teachers' continuance intention to use generative AI: An integrated perspective of the Expectation-Confirmation Model and Institutional Theory

💡 This research creating new content with machine learning.

Chinese teachers' continuance intention to use generative artificial intelligence (AI) is investigated by integrating the Expectation-Confirmation Model with Institutional Theory . Confirmation, perceived usefulness, and satisfaction play important roles in shaping teachers' continued use of AI .

Abstract ↗ PDF ↗

🟢 Applied

Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

💡 This research creating new content with computer vision.

World models have recently re-emerged as a central paradigm for embodied intelligence, robotics, autonomous driving, and model-based reinforcement learning . We propose Hamiltonian World Models as a physically grounded perspective on world modeling . The key idea is to encode observations into a structured latent phase space, evolve the latent state through Hamiltonian-inspired dynamics .

Abstract ↗ PDF ↗

🟢 Applied

BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis

💡 This research presents techniques for language AI.

BlenderRAG is a retrieval-augmented generation system that operates on a curated multimodal dataset of 500 expert-validated examples (text, code, image) across 50 object categories . The dataset and code will be available at https://github.com/MaxRondelli/BlenderRAG .

Abstract ↗ PDF ↗

🟢 Applied

Linking Behaviour and Perception to Evaluate Meaningful Human Control over Partially Automated Driving

💡 This research reduces machine learning.

Meaningful human control (MHC) has been proposed as a normative framework to address this tension . But empirical methods for evaluating whether existing systems provide MHC remain underdeveloped .

Abstract ↗ PDF ↗

🟢 Applied

MMAudio-LABEL: Audio Event Labeling via Audio Generation for Silent Video

💡 This research explores techniques in speech processing.

Recent advances in multimodal generation have enabled high-quality audio generation from silent videos . Practical applications, such as sound production, demand explicit sound event labels detailing the type and timing of sounds . We propose MMAudio-LABEL (LAtent-Based Event Labeling), an event-aware audio generation framework .

Abstract ↗ PDF ↗

🟢 Applied

On the Role of Artificial Intelligence in Human-Machine Symbiosis

💡 This research explores techniques in computer vision.

The evolution of artificial intelligence (AI) has rendered the boundary between humanity and machines increasingly ambiguous . In general, the role assumed by AI is often specified, either implicitly or explicitly in the input prompt, yet becomes less apparent or altogether unobservable when the generated content alone is available . This study considers the problem of tracing the functional role played by AI in natural language generation .

Abstract ↗ PDF ↗

🟢 Applied

MMAudioReverbs: Video-Guided Acoustic Modeling for Dereverberation and Room Impulse Response Estimation

💡 This research explores techniques in computer vision.

Video-to-audio (V2A) models do not explicitly model room-acoustic effects such as reverberation or room impulse responses . MMAudioReverbs is a unified framework dealing with dereverberation and room impulse response (RIR) estimation .

Abstract ↗ PDF ↗

🟢 Applied

Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

💡 This research explores techniques in language AI.

A piece of content claims a behavior; the runtime must decide whether to believe it . Without skill verification, a human-in-the-loop gate must fire on every irreversible call . With skill verification treated as a separate, gated process, HITL fires only for what is unverified, and the system becomes sustainable .

Abstract ↗ PDF ↗

🟢 Applied

LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

💡 This research explores techniques in speech processing.

A speaker encoder used in multilingual voice cloning should treat the same speaker identically regardless of which script the audio was uttered in . Off-the-shelf encoders do not, and the failure is accent-conditional . We present LASE (Language-Adversarial Speaker Encoder), a small projection head over frozen WavLM-base-plus trained with two losses . LASE matches ECAPA-TDNN on cross-script speaker recall (0

Abstract ↗ PDF ↗

🟢 Applied

Directed Social Regard: Surfacing Targeted Advocacy, Opposition, Aid, Harms, and Victimization in Online Media

💡 This research explores techniques in language AI.

Directed Social Regard (DSR) approach to multi-dimensional, multi-valence sentiment analysis . NLP tools cannot report that positive and negative sentiments coexist . DSR approach is comprised of a pair of transformer-based models that (1) detects span-level targets of sentiment in a message and (2) scores all spans within the message context along three (-1, 1) axes of regard .

Abstract ↗ PDF ↗

🟢 Applied

Empowering Heterogeneous Graph Foundation Models via Decoupled Relation Alignment

💡 This research achieves better edge computing.

Decoupled relation Subspace Alignment (DRSA) introduces a dual-relation subspace projection mechanism to coordinate cross-type interactions within a shared low-rank relation subspace explicitly . DRSA constructs a well-calibrated, structure-aware latent space .

Abstract ↗ PDF ↗

🟢 Applied

Structure Liberates: How Constrained Sensemaking Produces More Novel Research Output

💡 This research explores techniques in language AI.

SCISENSE is a sensemaking-grounded framework that operationalizes ideation as a structured sequence of eight cognitive stages (Pirolli \& Card, 2005) We construct a 100K-scale dataset of citation-conditioned research trajectories in two modes: Target and Infer . Target-trained models achieve a 2.0\% improvement in trajectory quality over Infer models . This advantage propagates downstream: coding agents conditioned on Target trajectories produce research artifacts with higher

Abstract ↗ PDF ↗

🟢 Applied

Silicon Showdown: Performance, Efficiency, and Ecosystem Barriers in Consumer-Grade LLM Inference

💡 This research presents techniques for language AI.

The operational landscape of local Large Language Model (LLM) inference has shifted from lightweight models to datacenter-class weights exceeding 70B parameters . We conclude that for consumer-grade inference, optimal hardware is defined by a complex interplay between compute density (N Nvidia) and memory capacity .

Abstract ↗ PDF ↗

🟢 Applied

Unsupervised Denoising of Real Clinical Low Dose Liver CT with Perceptual Attention Networks

💡 This research reduces computer vision.

This paper focuses on the denoising problem of low-dose computed tomography using deep learning . The proposed framework combines a U-Net structure for multi-scale feature extraction and an attention mechanism for feature fusion . It also introduces perceptual loss to improve the network for the characteristics of medical images .

Abstract ↗ PDF ↗

🟢 Applied

To Call or Not to Call: A Framework to Assess and Optimize LLM Tool Calling

💡 This research explores techniques in language AI.

Effective tool use hinges on a core LLM decision: whether to call or not call a tool, when performing a task . This decision is particularly challenging for web search tools, where the benefits of external information depend on the model's internal knowledge . We introduce a principled framework inspired by decision-making theory to evaluate web search tool-use decisions .

Abstract ↗ PDF ↗

🟢 Applied

Possibilistic Predictive Uncertainty for Deep Learning

💡 This research achieves better machine learning.

Dirichlet-approximated possibilistic posterior predictions (DAPPr) is a principled framework leveraging possibility theory . We define a possible posterior over parameters, projects this posterior to the prediction space via supremum operators, and approximates the projected posterior using learnable Dirichlets possibility functions .

Abstract ↗ PDF ↗

🟢 Applied

AI Washing Inflates Expected Performance but Not Interaction Outcomes: An AI Placebo Study Using Fitts' Law

💡 This research explores techniques in edge computing.

Expectations about the support of artificial intelligence may influence interaction outcomes similar to placebos . Such expectations may result from AI washing, a practice of overstating a system's capabilities when actual functionality is limited .

Abstract ↗ PDF ↗

🟢 Applied

DySRec: Dynamic Context-Aware Psychometric Scale Recommendation via Multi-Agent Collaboration

💡 This research explores techniques in machine learning.

DySRec operates as an interactive chatbot that engages users in multi-turn dialogue . It models scale selection as a continuous conversational decision process, and coordinates specialized agents to maintain user context, recommend assessment scales, monitor psychological risk, and log decision trajectories .

Abstract ↗ PDF ↗

🟢 Applied

Instance-Aware Parameter Configuration in Bilevel Late Acceptance Hill Climbing for the Electric Capacitated Vehicle Routing Problem

💡 This research optimizes machine learning.

A single globally tuned configuration often fails to exploit the heterogeneity of instances . This limitation is particularly evident in the Electric Capacitated Vehicle Routing Problem . The proposed approach achieves an average objective value reduction of $0.28\%$ across eight held-out test instances .

Abstract ↗ PDF ↗

🟢 Applied

Pick and Sort for Graphical Authentication

💡 This research proposes a method for computer vision.

We propose a graphical authentication scheme that follows a simple "Pick and Sort'' design . Users choose visual elements and arrange them within a grid . The number of selected elements and the grid size are configurable . The scheme is easy to learn and flexible to deploy .

Abstract ↗ PDF ↗

🟢 Applied

Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

💡 This research enhances language AI.

$Ψ$-RAG is a tree-RAG framework with two key components . It has a hierarchical abstract tree index built through an iterative "merging and collapse" process . A multi-granular retrieval agent that intelligently interacts with the knowledge base with reorganized queries and an agent-powered hybrid retriever .

Abstract ↗ PDF ↗

🟢 Applied

Space Network of Experts: Architecture and Expert Placement

💡 This research explores techniques in language AI.

Space Network of Experts (Space-XNet) framework targets distributed execution of a popular mixture-of-experts (MoE) model in space . Space data centers are envisioned as a promising platform for executing energy-intensive large language models (LLMs)

Abstract ↗ PDF ↗

🔬

Lightweight Systems

🟢 Applied

At the Edge of the Heart: ULP FPGA-Based CNN for On-Device Cardiac Feature Extraction in Smart Health Sensors for Astronauts

💡 This research speeds up edge computing.

The convergence of accelerating human spaceflight ambitions and critical terrestrial health monitoring demands is driving unprecedented requirements for reliable, real-time feature extraction on extremely resource-constrained wearable health sensors . The implementation achieves a validation accuracy of 98% while consuming only 8.55 mW .

Abstract ↗ PDF ↗

🟢 Applied

Lightweight Tamper-Evident Log Integrity Verification for IoT Edge Environments: A Merkle Tree Pipeline with Adaptive Chunking

💡 This research explores techniques in edge computing.

A paper presents a lightweight and evaluated integrity verification pipeline that combines Merkle-tree commitments with resource-aware adaptive chunking to provide tamper evidence without relying on distributed ledger technologies . Tampering detection achieves perfect precision, recall, and F1-score (1.0) across corruption ratios ranging from 1% to 50% .

Abstract ↗ PDF ↗

🟢 Applied

Progressive Semantic Communication for Efficient Edge-Cloud Vision-Language Models

💡 This research running AI on low-power devices for language AI.

Vision-Language Models (VLMs) on edge devices remain challenging due to their substantial computational and memory demands . Fully offloading inference to the cloud is often impractical in bandwidth-limited environments . We propose a progressive semantic communication framework for edge-cloud VLM inference .

Abstract ↗ PDF ↗

🟢 Applied

NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference

💡 This research speeds up language AI.

NVLLM is a 3D NAND-centric inference architecture that offloads feed-forward network (FFN) computation into the Flash while executing attention on lightweight CMOS logic with external DRAM . The rapid growth of LLMs demands high-throughput, memory-capacity-intensive inference on resource-constrained edge devices .

Abstract ↗ PDF ↗

🟢 Applied

Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge

💡 This research improves language AI.

Tempus is a Resource-Invariant Temporal GEMM framework for the AMD Versal AI Edge SoC . Tempus employs a fixed compute block of 16 AIE-ML cores . The framework maintains a 0.00% utilization of URAM/DSP, yielding 22.0x core frugality .

Abstract ↗ PDF ↗

🟢 Applied

VitaLLM: A Versatile and Tiny Accelerator for Mixed-Precision LLM Inference on Edge Devices

💡 This research speeds up language AI.

VitaLLM is a mixed precision accelerator that enables ternary weight large language models to run efficiently on edge devices . A 16 nm silicon prototype at 1 GHz/0.8 V achieves 72.46 tokens/s in decode and 0.88 s prefill (64 tokens) within 0.214 mm^2 and 120 KB on-chip memory .

Abstract ↗ PDF ↗

🟢 Applied

VitaLLM: A Versatile, Ultra-Compact Ternary LLM Accelerator with Dependency-Aware Scheduling

💡 This research reduces language AI.

Large Language Models (LLMs) on resource-constrained edge devices faces bottlenecks in memory bandwidth and power consumption . VitaLLM achieves a decoding throughput of 70.70 tokens/s within an ultra-compact area of 0.223 mm$^2$ and a power consumption of 65.97 mW .

Abstract ↗ PDF ↗

🟢 Applied

DMRlib: Easy-coding and Efficient Resource Management for Job Malleability

💡 This research explores techniques in machine learning.

Process malleability has proved to have a highly positive impact on resource utilization and global productivity in data centers compared with the conventional static resource allocation policy . However, the non-negligible additional development effort this solution imposes has constrained its adoption by the scientific programming community . We present DMRlib, a library designed to offer the global advantages of process malleable while providing a minimalist MPI-like syntax .

Abstract ↗ PDF ↗

🟢 Applied

DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference

💡 This research makes more efficient language AI.

The increasing deployment of Large Language Model (LLM) inference on edge AI systems demands efficient execution under tight memory budgets . A key challenge arises from Key-Value (KV) caches, which often exceed available device memory . We present DUAL-BLADE, a dual-path KV residency framework that dynamically assigns KV tensors to either a page-cache path or an NVMe-direct path based on memory availability .

Abstract ↗ PDF ↗

🟢 Applied

AnTi-MiCS: Analytical Framework for Bounding Time in Embedded Mixed-Criticality Systems

💡 This research improves edge computing.

In Mixed-Criticality (MC) systems, the high Worst-Case Execution Time (WCET) serves as a conservative upper bound representing the task's maximum execution time under all conditions . Opting for a very low value of this WCET enhances processor utilization by scheduling more tasks in LO mode . Employing a larger WCET ensures fewer mode switches, thereby enhancing QoS .

Abstract ↗ PDF ↗

🟢 Applied

AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework

💡 This research faster predictions in machine learning.

AI inference is becoming a persistent and geographically distributed source of electricity demand . AI inference workloads can sometimes be executed away from the user-facing service location, provided that latency, state locality, capacity and regulatory constraints remain acceptable .

Abstract ↗ PDF ↗

🟢 Applied

Towards the Democratization and Standardization of Dynamic Resources with MPI Spawning

💡 This research makes more efficient machine learning.

This paper presents an efficient tool for managing dynamic resources in production high-performance computing (HPC) settings . We introduce a unified dynamic resource management application programming interface (API)

Abstract ↗ PDF ↗

🟢 Applied

End-to-End and Phase-Level Performance Optimization for Hyperledger Fabric

💡 This research enhances edge computing.

Hyperledger Fabric (HLF) is a modular, permissioned blockchain widely adopted in enterprise settings . We present a systematic, phase-level and end-to-end study of HLF optimization along three fronts .

Abstract ↗ PDF ↗

🟢 Applied

A Test Taxonomy and Continuous Integration Ecosystem for Dynamic Resource Management in HPC

💡 This research explores techniques in machine learning.

High-performance computing systems are increasingly exploring dynamic resource management and malleable MPI applications . The correctness of these techniques is often evaluated through ad hoc experiments that can be difficult to reproduce and maintain . The proposed methodology improves early fault detection, simplifies maintenance under evolving dependencies .

Abstract ↗ PDF ↗

🟢 Applied

Efficient, VRAM-Constrained xLM Inference on Clients

💡 This research makes more efficient language AI.

To usher in the next round of client AI innovation, there is an urgent need to enable efficient, lossless inference of high-accuracy large language models and vision language models . To address this, we present pipelined sharding, a novel, benchmark-guided CPU-GPU hybrid scheduling technique .

Abstract ↗ PDF ↗

🟢 Applied

A PVT-Resilient Subthreshold SRAM-Based In-Memory Computing Accelerator with In-Situ Regulation for Energy-Efficient Spiking Neural Networks

💡 This research makes more efficient edge computing.

This paper presents a PVT-resilient, subthreshold SRAM-based computing-in-memory (CIM) macro tailored for energy-efficient spiking neural networks (SNNs) The macro integrates in-situ current sensors and distributed voltage regulators to enable robust large-scale (1024 wordlines, 1304 bitlines and 128 shared neuron cells) subth threshold current-mode CIM .

Abstract ↗ PDF ↗

🟢 Applied

DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference

💡 This research speeds up computer vision.

Video and image streaming on edge devices requires low latency . To address this, Neural Networks (NNs) are widely used, and prior work mainly focuses on accelerating them with single hardware units . However, further reductions in latency can be observed by combining these units . In this paper, partitioning CNN inference across DPU and GPU is proposed .

Abstract ↗ PDF ↗

🟢 Applied

Efficient Training on Multiple Consumer GPUs with RoundPipe

💡 This research reduces language AI.

Fine-tuning Large Language Models on consumer-grade GPUs constrained by limited memory and slow PCIe interconnects . Pipeline parallelism combined with CPU offloading mitigates hardware bottlenecks by reducing communication overhead .

Abstract ↗ PDF ↗

🟢 Applied

FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving

💡 This research makes more efficient computer vision.

Mixture-of-Experts (MoE) models offer high capacity with efficient inference cost by activating a small subset of expert models per input . FaaSMoE decouples the control and execution planes of MoE by deploying experts as stateless FaaS functions, enabling on-demand and scale-to-zero expert invocation across tenants .

Abstract ↗ PDF ↗

🟢 Applied

SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning

💡 This research makes more efficient language AI.

Federated Split Learning has been identified as an efficient approach to address the computational resource constraints of clients in classical federated learning . However, it faces some critical challenges when such a training strategy meets large language models for fine-tuning . To bridge this gap, we propose SplitTF, an adaptive federated split learning system . SplitFT enables different clients to set different cut layers according to their resources and trained model performance .

Abstract ↗ PDF ↗

🟢 Applied

Folding Tensor and Sequence Parallelism for Memory-Efficient Transformer Training & Inference

💡 This research presents techniques for edge computing.

TSP is a parallel execution strategy that folds tensor parallelism and sequence parallelism onto a single device axis . By sharding both weights and activations across the same devices, TSP trades additional communication volume for reduced memory overhead . We provide theoretical communication and memory analysis, describe our implementation of TSP attention and gated MLP blocks .

Abstract ↗ PDF ↗

🟢 Applied

DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference

💡 This research faster predictions in language AI.

DAK is an end-to-end direct-access memory offloading framework that repurposes the Tensor Memory Accelerator (TMA) to fetch offloaded weights and KV caches directly from remote memory into GPU shared memory . DAK achieves near-optimal bandwidth aggregation, with up to 3$\times$ performance gains on NVLink-C2C and 1.8$/times$ on PCIe systems .

Abstract ↗ PDF ↗

🟢 Applied

Network Digital Untwinning: Towards Backward Optimization of Digital Twins

💡 This research protecting data privacy in privacy-preserving AI.

Network digital twins (NDTs) are transforming network management by offering precise virtual replicas of physical network systems . Their reliance on diverse and sensitive data introduces significant challenges related to data management, regulatory compliance, and user privacy . Traditional approaches often fall short of preserving the integrity of the twin model .

Abstract ↗ PDF ↗

🟢 Applied

From Impermanent Loss to Sustainable Gain: Quantifying Profitability Zones for Liquidity Providers on DEX

💡 This research explores techniques in machine learning.

Decentralized Finance (DeFi) is a rapidly evolving segment of blockchain technology that enables a transformative approach to financial services through Web3 applications . By leveraging smart contracts, DeFi allows developers to build flexible and innovative financial instruments .

Abstract ↗ PDF ↗

🟢 Applied

Distributed Santa Claus via Global Rounding

💡 This research running AI locally on devices for edge computing.

In this paper, we consider the Santa Claus problem in the CONGEST model . This NP-hard problem can be modeled as a bipartite graph of children and gifts where an edge indicates that a child desires a gift . Each gift can have a different value .

Abstract ↗ PDF ↗

🔬

Offline-First / Local AI

🟢 Applied

Revealing graph bandits for maximizing local influence

💡 This research explores techniques in edge computing.

We study a graph bandit setting where the objective of the learner is to detect the most influential node of a graph by requesting as little information from the graph as possible . We propose BARE, a bandit strategy for which we prove a regret guarantee that scales with the detectable dimension .

Abstract ↗ PDF ↗

🟢 Applied

Scalable Context-Aware Graph Attention for Unsupervised Anomaly Detection in Large-Scale Mobile Networks

💡 This research explores techniques in machine learning.

Mobile network operators must monitor thousands of heterogeneous network elements across the radio access network and the packet core . Scale and cost of incident labelling make supervised approaches impractical, motivating unsupervised anomaly detection robust to context shifts and nonstationarity . C-MTAD-GAT is an anomaly detection framework designed to operate as a single shared model across large populations of network elements .

Abstract ↗ PDF ↗

🟢 Applied

CleanBase: Detecting Malicious Documents in RAG Knowledge Databases

💡 This research running AI locally on devices for edge computing.

Retrieval-augmented generation (RAG) is vulnerable to prompt injection attacks, in which an adversary inserts malicious documents containing carefully crafted injected prompts into the knowledge database . When a user issues a question targeted by the attack, the RAG system may retrieve these malicious documents, whose injected prompts mislead it into generating attacker-specified answers . CleanBase constructs a similarity graph over the database, where each node represents a document and an edge connects two nodes if their semantic similarity exceeds a statistically

Abstract ↗ PDF ↗

🟡 Advanced

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

💡 This research explores techniques in language AI.

AI agents execute tens to hundreds of chained LLM calls per task . GPU schedulers treat each call as independent, discarding gigabytes of intermediate state between steps and inflating end-to-end latency by 3-8x . We present SAGA, a distributed scheduler that implements this abstraction through three mechanisms .

Abstract ↗ PDF ↗

🟢 Applied

ControBench: An Interaction-Aware Benchmark for Controversial Discourse Analysis on Social Networks

💡 This research explores techniques in language AI.

ControBench is a benchmark for controversial discourse analysis that combines heterogeneous social interaction graphs with rich textual semantics . Built from Reddit discussions on three topics, Trump, abortion, and religion, Controbench contains 7,370 users, 1,783 posts, and 26,525 interactions .

Abstract ↗ PDF ↗

🟢 Applied

Bridging Graph Drawing and Dimensionality Reduction with Stochastic Stress Optimization

💡 This research reduces computer vision.

We present a scikit-learn compatible estimator that minimizes global stress through local pairwise updates, improving upon the existing implementation . Experiments on standard high-dimensional benchmarks show that our stochastic solver converges substantially faster than SMACOF .

Abstract ↗ PDF ↗

🟢 Applied

Class Angular Distortion Index for Dimensionality Reduction

💡 This research reduces computer vision.

Dimensionality reduction (DR) techniques are often characterized by whether they preserve global, high-level structures in data or local, neighborhood structures . Existing cluster quality metrics either only measure cluster separability or assume spherical, globular clusters in the original space . We introduce the Class Angular Distortion Index (CADI), a metric that uses internal angles among point triples to determine the faithfulness of cluster organization .

Abstract ↗ PDF ↗

🟢 Applied

Gradient Regularized Newton Boosting Trees with Global Convergence

💡 This research explores techniques in edge computing.

Restricted Newton Descent studies convex optimization with Newton's method on Hilbert spaces with inexact iterates . Modern implementations like XGBoost, LightGBM, and CatBoost are based on Newton boosting: a second-order descent step in the space of decision trees .

Abstract ↗ PDF ↗

🟡 Advanced

Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance

💡 This research achieves better language AI.

Large Language Model (LLM) Red-Teaming, which proactively identifies vulnerabilities of LLMs, is an essential process for ensuring safety . Generative Flow Networks (GFNs) that perform distribution matching are notorious for training instability and mode collapse . We propose Stable-GFN, which eliminates partition function $Z$ estimation in GFN and reduces training instability .

Abstract ↗ PDF ↗

🟢 Applied

Scale-Aware Adversarial Analysis: A Diagnostic for Generative AI in Multiscale Complex Systems

💡 This research explores techniques in machine learning.

Complex physical systems, from supersonic turbulence to the macroscopic structure of the universe, are governed by continuous multiscale dynamics . Modern machine learning architectures excel at mapping the high-dimensional observables of these systems, it remains unclear whether they internalize the governing physical laws or merely interpolate discrete statistical correlations .

Abstract ↗ PDF ↗

🟢 Applied

A Comparative Study of QSPR Methods on a Unique Multitask PAMPA dataset

💡 This research presents techniques for edge computing.

We present a unique, multitask dataset comprising 143 drug and drug candidate molecules . Each evaluated on in vitro, parallel artificial-membrane permeability assays (PAMPA) using six different model membranes . This is the most comprehensive study on simultaneous modeling of multiple organ-specific PAMPA membranes to date .

Abstract ↗ PDF ↗

🟢 Applied

Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents

💡 This research explores techniques in language AI.

Foresight Arena is the first permissionless, on-chain benchmark for evaluating AI forecasting agents on real-world prediction markets . Agents submit probabilistic forecasts on binary Polymarket markets via a commit-reveal protocol enforced by Solidity smart contracts . Performance is measured by the Brier Score and a novel Alpha Score .

Abstract ↗ PDF ↗

🟡 Advanced

AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments

💡 This research reduces language AI.

AdaMeZO is a zeroth-order optimizer that leverages Adam-style first- and second-moment estimates without maintaining them in memory . It can outperform MeZO while requiring up to $70\%$ fewer forward passes .

Abstract ↗ PDF ↗

🟢 Applied

From Prediction to Practice: A Task-Aware Evaluation Framework for Blood Glucose Forecasting

💡 This research explores techniques in machine learning.

We present a task-aware evaluation framework for blood glucose forecasting built around two downstream uses: hypoglycemia early warning and insulin dosing decision support . We evaluate on real data from three clinical cohorts using event-level recall and false alarms per patient-day . We show that models appearing acceptable overall, with recall above 0.9 on the full test set, can fail badly in the post-bolus slice .

Abstract ↗ PDF ↗

🟢 Applied

Knowing when to trust machine-learned interatomic potentials

💡 This research forecasting machine learning.

Machine-learned uncertainty-quantification methods rely on ensembles of independently trained backbones . These methods scale unfavorably with foundation-scale MLIPs, and their member-disagreement signals correlate weakly with per-molecule prediction error . The resulting method, PROBE (Post-hoc Reliability frOm Backbone Embeddings), produces a per-prediction reliability probability that monotonically tracks actual error without modification to the underlying

Abstract ↗ PDF ↗

🟢 Applied

Fairness of Classifiers in the Presence of Constraints between Features

💡 This research explores techniques in machine learning.

In Machine Learning, an accepted definition of fairness of a decision taken by a classifier is that it should not depend on protected features, such as gender . We propose that a decision be considered fair if it has a fair explanation . We identify relationships between different definitions of fairness and study the computational complexity of testing fairness of classifiers .

Abstract ↗ PDF ↗

🟢 Applied

Jailbreaking Vision-Language Models Through the Visual Modality

💡 This research explores techniques in language AI.

The visual modality of vision-language models (VLMs) is an underexplored attack surface for bypassing safety alignment . We introduce four jailbreak attacks exploiting the vision component . They include encoding harmful instructions as visual symbol sequences with a decoding legend .

Abstract ↗ PDF ↗

🟡 Advanced

Beyond Continuity: Simulation-free Reconstruction of Discrete Branching Dynamics from Single-cell Snapshots

💡 This research faster predictions in machine learning.

Inferring cellular trajectories from destructive snapshots is complicated by the challenges of stochasticity and non-conservative mass dynamics such as cell proliferation and apoptosis . Existing unbalanced Optimal Transport (OT) methods treat mass as a continuous fluid, performing inference at the population level . But this macroscopic view often fails to capture the discrete, jump-like nature of birth-death events at single-cell resolution . We present Unbalanced Schrödinger Bridge (USB

Abstract ↗ PDF ↗

🟢 Applied

Vesselpose: Vessel Graph Reconstruction from Learned Voxel-wise Direction Vectors in 3D Vascular Images

💡 This research explores techniques in computer vision.

The prevailing segment-then-fix paradigm is fundamentally limited regarding its suitability for modeling the task of complete and topologically accurate vascular network reconstruction . We propose an approach to extract topologically more accurate vascular graphs from 3D image data . Our approach achieves state-of-the-art performance on three benchmark datasets, spanning both synthetic and real imagery .

Abstract ↗ PDF ↗

🟢 Applied

Multi-frame Restoration for High-rate Lissajous Confocal Laser Endomicroscopy

💡 This research explores techniques in computer vision.

Lissajous confocal laser endomicroscopy (CLE) is a promising solution for high speed in vivo optical biopsy for handheld scenarios . However, at high frame rates, many pixels remain unvisited, creating structured holes . We propose MIRA, a lightweight recurrent framework for CLE restoration . MIRA outperforms both lightweight and high-complexity baselines in restoration quality .

Abstract ↗ PDF ↗

🟢 Applied

End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer

💡 This research optimizes computer vision.

Autoregressive image modeling relies on visual tokenizers to compress images into compact latent representations . We design an end-to-end training pipeline that jointly optimizes reconstruction and generation . This contrasts with prior two-stage approaches that train tokenizers and generative models separately .

Abstract ↗ PDF ↗

🟢 Applied

LambdaRankIC: Directly Optimizing Rank IC for Financial Prediction

💡 This research forecasting machine learning.

In financial predictions, the performance of machine learning models is often assessed by Rank IC . Rank IC is Spearman rank correlation between the model predictions and the realized asset returns . We propose LambdaRankIC, a novel learning-to-rank approach that directly optimizes Rank IC.

Abstract ↗ PDF ↗

🟢 Applied

Distance metric learning for conditional anomaly detection

💡 This research proposes a method for machine learning.

A recently proposed conditional anomaly detection framework extends anomaly detection to the problem of identifying anomalous patterns on a subset of attributes in the data . The anomaly always depends (is conditioned) on the value of remaining attributes . The work presented in this paper focuses on instance-based methods for detecting conditional anomalies .

Abstract ↗ PDF ↗

🟢 Applied

Trading off rewards and errors in multi-armed bandits

💡 This research presents techniques for machine learning.

In multi-armed bandits, the most-explored arms are the most informative, while reward maximization typically pulls only the best arm . We present an algorithm with regret guarantees that interpolates between the two objectives .

Abstract ↗ PDF ↗

🟢 Applied

Near-optimal and Efficient First-Order Algorithm for Multi-Task Learning with Shared Linear Representation

💡 This research makes more efficient machine learning.

Multi-task learning (MTL) has emerged as a pivotal paradigm in machine learning by leveraging shared structures across multiple related tasks . Despite its empirical success, the development of likelihood-based efficiently solvable algorithms remains largely underdeveloped . This paper introduces a first-order algorithm that jointly learns a shared representation and task-specific parameters .

Abstract ↗ PDF ↗