MECO: A Multimodal Dataset for Emotion and Cognitive Understanding in Older Adults
π‘ This research understanding emotions in speech processing.
MECO includes 42 participants and provides approximately 38 hours of multimodal signals, yielding 30,592 synchronized samples . The modalities cover video, audio, electroencephalography (EEG), and electrocardiography (ECG) In addition, the dataset offers comprehensive annotations of emotional and cognitive states .
If It's Good Enough for You, It's Good Enough for Me: Transferability of Audio Sufficiencies across Models
π‘ This research proposes a method for speech processing.
Transferability analysis finds transferability rates vary depending on the task . Some models, in particular on deepfake detection, have different transferability behavior . We call these models `flat-earther' models .
Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control
π‘ This research presents techniques for language AI.
We present a method to identify a valence-arousal (VA) subspace within large language model representations . Projections along our recovered VA subspace correlate with human-crowdsourced VA ratings across 44k lexical items . Steering along these directions induces near-monotonic bidirectional control over refusal and sycophancy .
Same Feedback, Different Source: How AI vs. Human Feedback Attribution and Credibility Shape Learner Behavior in Computing Education
π‘ This research explores techniques in language AI.
AI systems increasingly take on instructional roles - providing feedback, guiding practice, evaluating work . Does it matter to learners who they believe is on the other side? We investigated this using a three-condition experiment (N=148) in which participants completed a creative coding tutorial and received feedback generated by the same large language model attributed to either an AI system or a human teaching assistant .
User-Aware Conditional Generative Total Correlation Learning for Multi-Modal Recommendation
π‘ This research improves computer vision.
Multi-modal recommendation (MMR) enriches item representations by introducing item content, e.g., visual and textual descriptions . Success hinges on aligning these content modalities with user preferences derived from interaction data .
Split and Conquer Partial Deepfake Speech
π‘ This research proposes a method for speech processing.
Partial deepfake speech detection requires identifying manipulated regions that may occur within short temporal portions of otherwise bona fide utterances . We propose a split-and-conquer framework that decomposes the problem into two stages: boundary detection and segment-level classification . This formulation simplifies the learning objective by separating temporal localization from authenticity assessment .
Generative AI Use in Professional Graduate Thesis Writing: Adoption, Perceived Outcomes, and the Role of a Research-Specialized Agent
π‘ This research creating new content with computer vision.
This paper reports a survey of generative AI use among 83 MBA thesis students in Japan . 95.2% reported at least some use and 77.1% heavy use . Students engaged AI across the full research-writing workflow - literature review, drafting, and consultation when stuck .
Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models
π‘ This research explores techniques in language AI.
Chart-RL is a novel reinforcement learning framework that enhances VLMs chart understanding through feedback-driven policy optimization of visual perception and logical inference . The RL fine-tuned Qwen3-VL-4B-Instruct model achieved an answer accuracy of 0.634 .
Toward an Artificial General Teacher: Procedural Geometry Data Generation and Visual Grounding with Vision-Language Models
π‘ This research explores techniques in language AI.
We study visual explanation in geometry education as a Referring Image Segmentation problem . We present a fully automated procedural data engine that generates over 200,000 synthetic geometry diagrams with pixel-perfect segmentation masks and linguistically diverse referring expressions . We propose domain-specific fine-tuning of vision-language models (VLMs)
CharTool: Tool-Integrated Visual Reasoning for Chart Understanding
π‘ This research presents techniques for language AI.
DuoChart combines synthesized charts with real-world charts to construct diverse, high-quality chart training data . CharTool-7B outperforms the base model by **+8.0%** on CharXiv (Reasoning) and **+9.78% on ChartQAPro .
Coupled Control, Structured Memory, and Verifiable Action in Agentic AI (SCRAT -- Stochastic Control with Retrieval and Auditable Trajectories): A Comparative Perspective from Squirrel Locomotion and Scatter-Hoarding
π‘ This research explores techniques in machine learning.
Squirrel ecology offers a sharp comparative case because arboreal locomotion, scatter-hoarding, and audience-sensitive caching couple all three demands in one organism . We introduce a minimal hierarchical partially observed control model with latent dynamics and structured episodic memory .
InCoder-32B-Thinking: Industrial Code World Model for Thinking
π‘ This research optimizes edge computing.
Industrial software development across chip design, GPU optimization, and embedded systems lacks expert reasoning traces showing how engineers reason about hardware constraints and timing semantics . We propose InCoder-32B-Thinking, trained on data from the Error-driven Chain-of-Thought (ECoT) synthesis framework with an industrial code world model (ICWM) to generate reasoning traces .
Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
π‘ This research explores techniques in language AI.
Multi-speaker scenarios remain challenging due to overlapping speech, backchannels, rapid turn-taking, and context window constraints . We propose Speaker-Reasoner, an end-to-end Speech LLM with agentic multi-turn temporal reasoning .
Comparing the Impact of Pedagogy-Informed Custom and General-Purpose GAI Chatbots on Students' Science Problem-Solving Processes and Performance Using Heterogeneous Interaction Network Analysis
π‘ This research creating new content with machine learning.
Problem solving plays an essential role in science education . Generative AI (GAI) chatbots have emerged as a promising tool for supporting students' science problem solving . However, general-purpose chatbots (e.g., ChatGPT) often provide direct, ready-made answers, may lead to cognitive offloading .
R2-Write: Reflection and Revision for Open-Ended Writing with Deep Reasoning
π‘ This research improves language AI.
R2-Write is an automated framework that synthesizes high-quality thinking trajectories enriched with explicit reflection and revision patterns . To prevent redundant reflections, we design a process reward mechanism that supervises reflection quality during reinforcement learning .
Learning from Synthetic Data via Provenance-Based Input Gradient Guidance
π‘ This research improves computer vision.
Learning methods using synthetic data have attracted attention as an effective approach for increasing the diversity of training data while reducing collection costs . Many existing methods improve robustness only indirectly through diversification of training samples and do not explicitly teach the model which regions in the input space truly contribute to discrimination .
Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus
π‘ This research achieves better language AI.
Council Mode is a novel multi-agent consensus framework . It dispatches queries to multiple heterogeneous frontier LLMs in parallel and synthesizes their outputs through a dedicated consensus model . Council pipeline operates in three phases: (1) an intelligent triage classifier that routes queries based on complexity, (2) parallel expert generation across architecturally diverse models, and (3) structured consensus synthesis .
SentiAvatar: Towards Expressive and Interactive Digital Humans
π‘ This research achieves better speech processing.
We present SentiAvatar, a framework for building expressive interactive 3D digital humans, and use it to create SuSu, a virtual character that speaks, gestures, and emotes in real time . The source code, model, and dataset are available at https://sentiavatar.io .
High-resolution probabilistic estimation of three-dimensional regional ocean dynamics from sparse surface observations
π‘ This research presents techniques for machine learning.
The ocean interior regulates Earth's climate but remains sparsely observed due to limited in situ measurements . We present a depth-aware generative framework for reconstructing high-resolution ocean states from extremely sparse surface data . The framework accurately reconstructs subsurface temperature, salinity, and velocity fields across multiple depths .
ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents
π‘ This research explores techniques in language AI.
Longitudinal health agents must reason across multi-source trajectories that combine continuous device streams, sparse clinical exams, and episodic life events . We present ESL-Bench, an event-driven synthesis framework and benchmark providing 100 synthetic users . Users paired with 100 evaluation queries across five dimensions - Lookup, Trend, Comparison, Anomaly, Explanation - stratified into Easy, Medium, and Hard tiers .
NavCrafter: Exploring 3D Scenes from a Single Image
π‘ This research introduces a new approach to computer vision.
NavCrafter explores 3D scenes from a single image by synthesizing novel-view video sequences with camera controllability and temporal-spatial consistency . The framework leverages video diffusion models to capture rich 3D priors and adopts a geometry-aware expansion strategy to progressively extend scene coverage . We further propose a collision-aware camera trajectory planner and enhanced 3D Gaussian Splatting pipeline .
ChatSVA: Bridging SVA Generation for Hardware Verification via Task-Specific LLMs
π‘ This research enhances language AI.
ChatSVA is an end-to-end SVA generation system built upon a multi-agent framework . The AgentBridge platform enables this approach by systematically generating high-purity datasets, overcoming the data scarcity inherent to few-shot scenarios . The online service has been publicly released at an online service .
Help Converts Newcomers, Not Veterans: Generalized Reciprocity and Platform Engagement on Stack Overflow
π‘ This research running AI locally on devices for edge computing.
Generalized reciprocity -- the tendency to help others after receiving help oneself -- is widely theorized as a mechanism sustaining cooperation on online knowledge-sharing platforms . Yet robust empirical evidence from field settings remains surprisingly scarce . Using Cox proportional hazards models on over 21 million questions, we find that receiving an answer significantly increases a user's propensity to help other users . This effect is concentrated among newcomers and declines with platform experience .
Domain-Adapted Retrieval for In-Context Annotation of Pedagogical Dialogue Acts
π‘ This research presents techniques for language AI.
We present a domain-adapted RAG pipeline for tutoring move annotation . We adapt retrieval by fine-tuning a lightweight embedding model on tutoring corpora and indexing dialogues at the utterance level to retrieve labeled demonstrations . Retrieval corrects systematic label biases present in zero-shot prompting .
A Data-Centric Vision Transformer Baseline for SAR Sea Ice Classification
π‘ This research categorizing computer vision.
Synthetic Aperture Radar (SAR) is the operational standard because of its all-weather capability, but it remains challenging to distinguish morphologically similar ice classes under severe class imbalance . This paper establishes a trustworthy SAR only baseline that future fusion work can build upon .