arXiv Digest • 2026-05-11

🟢 Applied

Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs

💡 This research presents techniques for edge computing.

CMR-EXTR is a lightweight framework that converts free-text CMR reports into structured data and assigns per-field confidence for quality control . A teacher-student distillation pipeline enables fully offline inference while limiting manual annotation .

Abstract ↗ PDF ↗

🟢 Applied

Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation

💡 This research explores techniques in language AI.

Distillation uses the multi-step trajectory to train a student to reproduce the process in a few steps . Trajectory-Shaped Discrete Flow Matching (TS-DFM) replaces these blind jumps with guided navigation . The shaped student at 8 steps achieves 32% lower perplexity than the 1,024-step teacher while being 128x faster .

Abstract ↗ PDF ↗

🟢 Applied

Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration

💡 This research running AI locally on devices for edge computing.

Conformal Path Reasoning (CPR) is a trustworthy KGQA framework with two key innovations . CPR significantly improves the Empirical Coverage Rate by 34% while reducing average prediction set size by 40% compared to conformal baselines . CPR is a lightweight module trained via PUCT-guided exploration to learn discriminative path-level nonconformity scores .

Abstract ↗ PDF ↗

🟢 Applied

Semiparametric Efficient Test for Interpretable Distributional Treatment Effects

💡 This research explores techniques in edge computing.

DR-ME is the first semiparametrically efficient finite-location test for interpretable distributional treatment effects . The test evaluates an interventional kernel witness at learned outcome locations rather than only a global rejection . The results show near-nominal type-I error, competitive power against global doubly robust kernel tests .

Abstract ↗ PDF ↗

🟢 Applied

Graph-Structured Hyperdimensional Computing for Data-Efficient and Explainable Process-Structure-Property Prediction

💡 This research reduces machine learning.

PSP-HDC is a graph-structured hyperdimensional computing framework that encodes a directed PSP graph as an internal prior for representation, inference, and explanation . It achieves an accuracy of 0.910 +/- 0.077 over 1000 random splits and 0.896 under process-fold generalization .

Abstract ↗ PDF ↗

🟢 Applied

Bayesian Sensitivity of Causal Inference Estimators under Evidence-Based Priors

💡 This research faster predictions in edge computing.

Causal inference relies on untestable assumptions about the true data-generating process . Sensitivity analysis helps us determine how robust our conclusions are when we alter these underlying assumptions .

Abstract ↗ PDF ↗

🟢 Applied

EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

💡 This research explores techniques in computer vision.

EmambaIR is an Efficient visual State Space Model designed for image reconstruction using spatially sparse and temporally continuous event streams . The framework introduces two key components: the cross-modal Top-k Sparse Attention Module (TSAM) and the Gated State-Space Module (GSSM) The source code and data are publicly available at: https://://github.com/YunhangWickert/EmambaIR .

Abstract ↗ PDF ↗

🟢 Applied

Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment

💡 This research presents techniques for language AI.

Spatial intelligence in vision-language models attracts research interest with the practical demand to reason in the 3D world . Most existing methods follow the conventional 2D pipeline in VLMs and use pixel-aligned representations for the vision modality . We propose a Proxy3D method with compact yet comprehensive 3D proxy representations .

Abstract ↗ PDF ↗

🟢 Applied

Flow-OPD: On-Policy Distillation for Flow Matching Models

💡 This research optimizes language AI.

Flow-OPD is the first unified post-training framework that integrates on-policy distillation into Flow Matching models . It adopts a two-stage alignment strategy: it first cultivates domain-specialized teacher models via single-reward GRPO fine-tuning . It then establishes a robust initial policy through a Flow-based Cold-Start scheme .

Abstract ↗ PDF ↗

🟢 Applied

CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

💡 This research improves language AI.

CA-SQL is a novel Text-to-SQL pipeline that utilizes the estimated difficulty of a task to dynamically scale the breadth of the exploration for generating solution candidates . CA-Query achieves a state-of-the-art score of 51.72% on the "challenging" tier of BIRD development set problems, using only GPT-4o-mini .

Abstract ↗ PDF ↗

🟢 Applied

Accurate and Efficient Statistical Testing for Word Semantic Breadth

💡 This research presents techniques for machine learning.

A word type can be represented as a cloud of token vectors, with dispersion-based statistics serving as proxies for contextual diversity . We propose a Householder-aligned permutation test to isolate dispersion differences from directional differences . Empirically, our alignment reduced Type-I error by 32.5% while preserving sensitivity to genuine breadth differences .

Abstract ↗ PDF ↗

🟢 Applied

SphereVAD: Training-Free Video Anomaly Detection via Geodesic Inference on the Unit Hypersphere

💡 This research introduces a new approach to language AI.

Video anomaly detection (VAD) aims to automatically identify events that deviate from normal patterns in untrimmed surveillance videos . Existing methods universally depend on large-scale annotations or task-specific training procedures, severely limiting their rapid deployment to novel scenes . We propose SphereVAD, a fully training-free, zero-shot VAD framework .

Abstract ↗ PDF ↗

🟢 Applied

Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions

💡 This research presents techniques for language AI.

Future-rhyme information is linearly decodable at the line boundary, with signal that strengthens with scale in all three families . Only Gemma-3-27B causally relies on this encoding, exhibiting a handoff in which the causal driver migrates to the rhyme word around layer 30 .

Abstract ↗ PDF ↗

🟡 Advanced

It Just Takes Two: Scaling Amortized Inference to Large Sets

💡 This research faster predictions in computer vision.

The method trains a mean-pool Deep Set on sets of size at most two, producing an encoder that generalizes to arbitrary set sizes . The inference head is finetuned on pre-aggregated embeddings, making training cost essentially independent of the deployment set size N .

Abstract ↗ PDF ↗

🟢 Applied

Statistical inference with belief functions: A survey

💡 This research explores techniques in machine learning.

Belief functions are a powerful framework for the mathematical characterisation of uncertainty . The first step in a reasoning chain based on belief functions is inference: how to learn a belief measure from the available data .

Abstract ↗ PDF ↗

🟢 Applied

Normalizing Trajectory Models

💡 This research tackles the problem of computer vision.

Normalizing Trajectory Models (NTM) models each reverse step as an expressive conditional normalizing flow with exact likelihood training . NTM combines shallow invertible blocks within each step with a deep parallel predictor across the trajectory .

Abstract ↗ PDF ↗

🟢 Applied

Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph

💡 This research optimizes language AI.

Direct Preference Optimization (DPO) aligns language models using pairwise preference comparisons . However, in many practical settings, training data consists of multiple rollouts per prompt, inducing rich preference structure that DPO fails to exploit . We propose Graph Direct Preferential Optimization, a principled generalization of DPO that operates over directed acyclic preference graphs .

Abstract ↗ PDF ↗

🟢 Applied

Rethinking Dense Optical Flow without Test-Time Scaling

💡 This research achieves better computer vision.

Recent progress in dense optical flow has been driven by increasingly complex architectures and multi-step refinement for test-time scaling . While these approaches achieve strong benchmark performance, they also require substantial computation during inference . We argue that powerful visual semantic and geometric priors encoded in modern foundation models can reduce the need for computationally expensive iterative refinement at test time .

Abstract ↗ PDF ↗

🟢 Applied

Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback

💡 This research improves language AI.

SPEAR (Self-Play Enhancement via Advantage-Weighted Refinement) is an efficient online learning algorithm for federated LLM fine-tuning . SPEAR utilizes a feedback-guided self-play loop to construct naturally contrastive pairs per prompt .

Abstract ↗ PDF ↗

🟢 Applied

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

💡 This research improves language AI.

Test-time scaling (TTS) has become an effective approach for improving large language model performance . But existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition . We propose an environment-driven framework, AutoTTS, that changes what researchers design . The discovery environment must make the control space tractable and provide cheap, frequent feedback .

Abstract ↗ PDF ↗

🟢 Applied

GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs

💡 This research forecasting machine learning.

Conformal prediction (CP) provides distribution-free approach to uncertainty quantification with finite-sample guarantees . Combinatorial nature of graphs often leads to insufficiently certain predictions and indiscriminative embeddings . Existing methods primarily rely on embedding-space proximity for localization . We propose GRAPHLCP, a proximity-based localized CP framework .

Abstract ↗ PDF ↗

🟢 Applied

MoCoTalk: Multi-Conditional Diffusion with Adaptive Router for Controllable Talking Head Generation

💡 This research presents techniques for computer vision.

Talking-head generation requires joint modeling of identity, head pose, facial expression, and mouth dynamics . Existing methods typically address only a subset of these factors . We present MoCoTalk, a multi-conditional video diffusion framework .

Abstract ↗ PDF ↗

🟢 Applied

Fast Byte Latent Transformer

💡 This research tackles the problem of language AI.

Recent byte-level language models match the performance of token-level models without relying on subword vocabularies . We address this bottleneck in the Byte Latent Transformer through new training and generation techniques .

Abstract ↗ PDF ↗

🟢 Applied

Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models

💡 This research presents techniques for language AI.

Vision-language models (VLMs) raise growing concerns about privacy, copyright, and bias, motivating machine unlearning to remove sensitive knowledge . Existing methods primarily fine-tune the language decoder, leading to superficial forgetting that fails to erase underlying visual representations . We propose HFRU, a reinforcement unlearning framework that operates on the vision encoder for deep semantic removal .

Abstract ↗ PDF ↗

🟢 Applied

Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction

💡 This research makes more efficient machine learning.

Spiking Neural Networks (SNNs) have been proposed as biologically plausible and energy-efficient alternatives to conventional Artificial Neural Networks . The training of SNN usually relies on surrogate gradients due to the non-differentiability of the spike function, introducing approximation errors that accumulate across layers . We propose a parameter reconstruction algorithm for SNN training that demonstrates consistent and significant advantages across various tasks .

Abstract ↗ PDF ↗

arXiv Research Digest

Efficient ML / Edge AI

Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs

Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation

Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration

Semiparametric Efficient Test for Interpretable Distributional Treatment Effects

Graph-Structured Hyperdimensional Computing for Data-Efficient and Explainable Process-Structure-Property Prediction

Bayesian Sensitivity of Causal Inference Estimators under Evidence-Based Priors

EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment

Flow-OPD: On-Policy Distillation for Flow Matching Models

CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

Accurate and Efficient Statistical Testing for Word Semantic Breadth

SphereVAD: Training-Free Video Anomaly Detection via Geodesic Inference on the Unit Hypersphere

Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions

It Just Takes Two: Scaling Amortized Inference to Large Sets

Statistical inference with belief functions: A survey

Normalizing Trajectory Models

Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph

Rethinking Dense Optical Flow without Test-Time Scaling

Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs

MoCoTalk: Multi-Conditional Diffusion with Adaptive Router for Controllable Talking Head Generation

Fast Byte Latent Transformer

Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models

Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction

Privacy-Preserving ML

Creative AI / Emotion

Lightweight Systems

Offline-First / Local AI