Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs
💡 This research presents techniques for edge computing.
CMR-EXTR is a lightweight framework that converts free-text CMR reports into structured data and assigns per-field confidence for quality control . A teacher-student distillation pipeline enables fully offline inference while limiting manual annotation .
Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation
💡 This research explores techniques in language AI.
Distillation uses the multi-step trajectory to train a student to reproduce the process in a few steps . Trajectory-Shaped Discrete Flow Matching (TS-DFM) replaces these blind jumps with guided navigation . The shaped student at 8 steps achieves 32% lower perplexity than the 1,024-step teacher while being 128x faster .
Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration
💡 This research running AI locally on devices for edge computing.
Conformal Path Reasoning (CPR) is a trustworthy KGQA framework with two key innovations . CPR significantly improves the Empirical Coverage Rate by 34% while reducing average prediction set size by 40% compared to conformal baselines . CPR is a lightweight module trained via PUCT-guided exploration to learn discriminative path-level nonconformity scores .
Semiparametric Efficient Test for Interpretable Distributional Treatment Effects
💡 This research explores techniques in edge computing.
DR-ME is the first semiparametrically efficient finite-location test for interpretable distributional treatment effects . The test evaluates an interventional kernel witness at learned outcome locations rather than only a global rejection . The results show near-nominal type-I error, competitive power against global doubly robust kernel tests .
Graph-Structured Hyperdimensional Computing for Data-Efficient and Explainable Process-Structure-Property Prediction
💡 This research reduces machine learning.
PSP-HDC is a graph-structured hyperdimensional computing framework that encodes a directed PSP graph as an internal prior for representation, inference, and explanation . It achieves an accuracy of 0.910 +/- 0.077 over 1000 random splits and 0.896 under process-fold generalization .
Bayesian Sensitivity of Causal Inference Estimators under Evidence-Based Priors
💡 This research faster predictions in edge computing.
Causal inference relies on untestable assumptions about the true data-generating process . Sensitivity analysis helps us determine how robust our conclusions are when we alter these underlying assumptions .
EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction
💡 This research explores techniques in computer vision.
EmambaIR is an Efficient visual State Space Model designed for image reconstruction using spatially sparse and temporally continuous event streams . The framework introduces two key components: the cross-modal Top-k Sparse Attention Module (TSAM) and the Gated State-Space Module (GSSM) The source code and data are publicly available at: https://://github.com/YunhangWickert/EmambaIR .
Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment
💡 This research presents techniques for language AI.
Spatial intelligence in vision-language models attracts research interest with the practical demand to reason in the 3D world . Most existing methods follow the conventional 2D pipeline in VLMs and use pixel-aligned representations for the vision modality . We propose a Proxy3D method with compact yet comprehensive 3D proxy representations .
Flow-OPD: On-Policy Distillation for Flow Matching Models
💡 This research optimizes language AI.
Flow-OPD is the first unified post-training framework that integrates on-policy distillation into Flow Matching models . It adopts a two-stage alignment strategy: it first cultivates domain-specialized teacher models via single-reward GRPO fine-tuning . It then establishes a robust initial policy through a Flow-based Cold-Start scheme .
CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation
💡 This research improves language AI.
CA-SQL is a novel Text-to-SQL pipeline that utilizes the estimated difficulty of a task to dynamically scale the breadth of the exploration for generating solution candidates . CA-Query achieves a state-of-the-art score of 51.72% on the "challenging" tier of BIRD development set problems, using only GPT-4o-mini .
Accurate and Efficient Statistical Testing for Word Semantic Breadth
💡 This research presents techniques for machine learning.
A word type can be represented as a cloud of token vectors, with dispersion-based statistics serving as proxies for contextual diversity . We propose a Householder-aligned permutation test to isolate dispersion differences from directional differences . Empirically, our alignment reduced Type-I error by 32.5% while preserving sensitivity to genuine breadth differences .
SphereVAD: Training-Free Video Anomaly Detection via Geodesic Inference on the Unit Hypersphere
💡 This research introduces a new approach to language AI.
Video anomaly detection (VAD) aims to automatically identify events that deviate from normal patterns in untrimmed surveillance videos . Existing methods universally depend on large-scale annotations or task-specific training procedures, severely limiting their rapid deployment to novel scenes . We propose SphereVAD, a fully training-free, zero-shot VAD framework .
Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions
💡 This research presents techniques for language AI.
Future-rhyme information is linearly decodable at the line boundary, with signal that strengthens with scale in all three families . Only Gemma-3-27B causally relies on this encoding, exhibiting a handoff in which the causal driver migrates to the rhyme word around layer 30 .
It Just Takes Two: Scaling Amortized Inference to Large Sets
💡 This research faster predictions in computer vision.
The method trains a mean-pool Deep Set on sets of size at most two, producing an encoder that generalizes to arbitrary set sizes . The inference head is finetuned on pre-aggregated embeddings, making training cost essentially independent of the deployment set size N .
Statistical inference with belief functions: A survey
💡 This research explores techniques in machine learning.
Belief functions are a powerful framework for the mathematical characterisation of uncertainty . The first step in a reasoning chain based on belief functions is inference: how to learn a belief measure from the available data .
Normalizing Trajectory Models
💡 This research tackles the problem of computer vision.
Normalizing Trajectory Models (NTM) models each reverse step as an expressive conditional normalizing flow with exact likelihood training . NTM combines shallow invertible blocks within each step with a deep parallel predictor across the trajectory .
Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph
💡 This research optimizes language AI.
Direct Preference Optimization (DPO) aligns language models using pairwise preference comparisons . However, in many practical settings, training data consists of multiple rollouts per prompt, inducing rich preference structure that DPO fails to exploit . We propose Graph Direct Preferential Optimization, a principled generalization of DPO that operates over directed acyclic preference graphs .
Rethinking Dense Optical Flow without Test-Time Scaling
💡 This research achieves better computer vision.
Recent progress in dense optical flow has been driven by increasingly complex architectures and multi-step refinement for test-time scaling . While these approaches achieve strong benchmark performance, they also require substantial computation during inference . We argue that powerful visual semantic and geometric priors encoded in modern foundation models can reduce the need for computationally expensive iterative refinement at test time .
Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback
💡 This research improves language AI.
SPEAR (Self-Play Enhancement via Advantage-Weighted Refinement) is an efficient online learning algorithm for federated LLM fine-tuning . SPEAR utilizes a feedback-guided self-play loop to construct naturally contrastive pairs per prompt .
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
💡 This research improves language AI.
Test-time scaling (TTS) has become an effective approach for improving large language model performance . But existing TTS strategies are largely hand-crafted: researchers manually design reasoning patterns and tune heuristics by intuition . We propose an environment-driven framework, AutoTTS, that changes what researchers design . The discovery environment must make the control space tractable and provide cheap, frequent feedback .
GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs
💡 This research forecasting machine learning.
Conformal prediction (CP) provides distribution-free approach to uncertainty quantification with finite-sample guarantees . Combinatorial nature of graphs often leads to insufficiently certain predictions and indiscriminative embeddings . Existing methods primarily rely on embedding-space proximity for localization . We propose GRAPHLCP, a proximity-based localized CP framework .
MoCoTalk: Multi-Conditional Diffusion with Adaptive Router for Controllable Talking Head Generation
💡 This research presents techniques for computer vision.
Talking-head generation requires joint modeling of identity, head pose, facial expression, and mouth dynamics . Existing methods typically address only a subset of these factors . We present MoCoTalk, a multi-conditional video diffusion framework .
Fast Byte Latent Transformer
💡 This research tackles the problem of language AI.
Recent byte-level language models match the performance of token-level models without relying on subword vocabularies . We address this bottleneck in the Byte Latent Transformer through new training and generation techniques .
Object Hallucination-Free Reinforcement Unlearning for Vision-Language Models
💡 This research presents techniques for language AI.
Vision-language models (VLMs) raise growing concerns about privacy, copyright, and bias, motivating machine unlearning to remove sensitive knowledge . Existing methods primarily fine-tune the language decoder, leading to superficial forgetting that fails to erase underlying visual representations . We propose HFRU, a reinforcement unlearning framework that operates on the vision encoder for deep semantic removal .
Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction
💡 This research makes more efficient machine learning.
Spiking Neural Networks (SNNs) have been proposed as biologically plausible and energy-efficient alternatives to conventional Artificial Neural Networks . The training of SNN usually relies on surrogate gradients due to the non-differentiability of the spike function, introducing approximation errors that accumulate across layers . We propose a parameter reconstruction algorithm for SNN training that demonstrates consistent and significant advantages across various tasks .