Position: AI as Part of Self -- Extending the Mind Requires Cognitive Co-Regulation
π‘ This research achieves better edge computing.
Contemporary AI increasingly participates in attention allocation, reasoning, synthesis, and decision-making, shaping the very cognitive processes through which humans form beliefs, make decisions, and constitute their sense of self . We identify the risks of unstructured delegation: deskilling, automation bias, transfer of epistemic authority, and oracle-style centralization of knowledge .
SLIP & ETHICS: Graduated Intervention for AI Emotional Companions
π‘ This research presents techniques for emotion AI.
AI emotional companions face a safety-rapport paradox: restrictive safeguards can damage supportive alliance, while permissive systems risk user harm . SLIP (Staged Layers of Intervention Protocol) is a four-stage graduated methodology deriving interventions from structured qualitative indicators .
Designing for Robot Wranglers: A Synthesis of Literature and Practice
π‘ This research presents techniques for machine learning.
Robots are increasingly present in human spaces, such as for conducting deliveries in hospitals, interacting with visitors at museums, and stocking items in warehouses . To ensure the seamless integration of robots into these spaces, a new role in human-robot interaction is emerging - the robot wrangler .
Designing Datacenter Power Delivery Hierarchies for the AI Era
π‘ This research speeds up computer vision.
Demand for AI accelerators is rapidly increasing rack power density, with projections approaching 1MW per deployment by 2027 . Power utilization is particularly important as grid power capacity is a scarce resource in the AI era . Designing an efficient power delivery hierarchy for the long run is difficult .
Evaluating Design Video Generation: Metrics for Compositional Fidelity
π‘ This research creating new content with machine learning.
Generative video models are increasingly used in design animation tasks . Unlike natural video generation, design animation imposes structured constraints . Specific components shall animate with prescribed motion types, directions, speed and timing . Non-animated regions must remain stable and layout structure must be preserved .
ARIA: A Diagnostic Framework for Music Training Data Attribution
π‘ This research reduces speech processing.
Training data attribution (TDA) for music generation must answer two questions that copyright analysis requires . Existing methods reduce influence to a single scalar, without revealing which musical aspects are dominant in that influence . We propose ARIA framework that decomposes attribution along musical aspects (five for symbolic music, three for audio)
GenShield: Unified Detection and Artifact Correction for AI-Generated Images
π‘ This research automatically finding computer vision.
Diffusion-based image synthesis has made AI-generated images (AIGI) increasingly photorealistic, raising concerns about authenticity in applications such as misinformation detection, digital forensics, and content moderation . We propose GenShield, a unified autoregressive framework that jointly performs explainable AIGI detection and controllable artifact correction .
GEMS -- Guided Evolutionary Molecule Design for Sustainable Chemicals
π‘ This research explores techniques in computer vision.
Machine learning (ML) methods have been developed to aid with de novo molecule design . Data on the environmental impacts of chemical compounds are sparse, resulting in low-fidelity ML oracles and unreliable candidate proposals . We present GEMS-an interactive visual analytics tool that enables domain experts to directly collaborate with a genetic algorithm for molecule design. Users can integrate their expert knowledge to guide the evolutionary process by modifying the scoring function and molecule population without programming knowledge .
Synchronized Realities: Towards Magic Mobile Experiences through Aligned AR
π‘ This research creating new content with machine learning.
In virtual reality environments, the alignment of perceptual modalities is crucial for immersion and presence . In the AR domain, it is difficult to create such alignments because elements in the physical world are often beyond the user's control . Recent advances in generative AI enable on-demand content creation, enabling highly reactive AR experiences .
Property-Guided LLM Program Synthesis for Planning
π‘ This research explores techniques in language AI.
LLMs have shown impressive success in program synthesis, discovering programs that surpass prior solutions . Instead of scoring programs after evaluation, we check whether a candidate satisfies a formally defined property . When the property is violated, we stop evaluation early and provide the LLM with a concrete counterexample showing how the program failed . This feedback drastically reduces both the number of program generations and the evaluation cost .
Generative Long-term User Interest Modeling for Click-Through Rate Prediction
π‘ This research enhances machine learning.
Modeling long-term user interests with massive historical user behaviors enhances click-through rate (CTR) prediction performance in advertising and recommendation systems . GenLI consists of an interest generation module (IGM), a behavior retrieval module (BRM), and an interest fusion module (IFM)
VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation
π‘ This research explores techniques in language AI.
Large Vision-Language Models (LVLMs) have shown significant progress in video understanding, but they face substantial challenges in tasks requiring precise spatiotemporal localization at the instance level . Existing methods primarily rely on text prompts for human-model interaction, but these prompts struggle to provide precise spatial and temporal references . VideoSeeker seamlessly integrates agentic reasoning with instance-level video understanding tasks .
Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making
π‘ This research creating new content with machine learning.
Ada-Diffuser is a causal diffusion model that learns the temporal structure of observed interactions and the underlying latent dynamics simultaneously . The model leverages these dynamics for planning and control tasks . It has a modular design that supports both planning and policy learning tasks .
Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law
π‘ This research enhances language AI.
Recent advances in large language models (LLMs) have significantly enhanced automated legal reasoning . Yet, it remains unclear whether their performance reflects genuine legal reasoning ability or artifacts of data contamination . We show that performance can be inflated by contamination .
XSearch: Explainable Code Search via Concept-to-Code Alignment
π‘ This research explores techniques in machine learning.
Semantic code search has been widely adopted in both academia and industry . These approaches embeds natural-language queries and code snippets into a shared embedding space and retrieve results based on vector similarity . But these approaches often suffer from poor explainability and generalization . We propose XSearch, an intrinsically explainable code search framework .
Constrained latent state modeling: A unifying perspective on representation learning under competing constraints
π‘ This research presents techniques for machine learning.
Learning latent representations from complex data is central to modern machine learning . In such settings, representations are better understood as latent states capturing underlying system dynamics . Yet current approaches remain fragmented, relying on distinct assumptions about what these states should represent . We propose constrained latent state modeling (CLSM) as a unifying perspective .
Beyond Content: A Comprehensive Speech Toxicity Dataset and Detection Framework Incorporating Paralinguistic Cues
π‘ This research understanding emotions in speech processing.
Current toxic speech datasets are predominantly text-based, limiting the development of models that can capture paralinguistic cues . We present ToxiAlert-Bench, a large-scale audio dataset with over 30,000 audio clips annotated with seven major toxic categories and twenty fine-grained toxic labels .
Driving Through the Network: Performance and Workload Under Latency and Video Impairment
π‘ This research explores techniques in machine learning.
We report a fixed-base driving-simulator study with a 2x2 manipulation of added latency (100/300 ms) and bitrate (500/2000 kbit/s) We measured effective glass-to-glass (G2G) latency per condition . Physiological measures (heart rate, RR interval, heart rate, skin conductance) exhibited sub-additive interactions, whereas performance and oculomotor interactions were small or non-significant .
Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search
π‘ This research presents techniques for language AI.
Probabilistic forecasting of infectious diseases is crucial for public health but relies on labor-intensive manual model curation by expert modeling teams . Here, we present an autonomous system using Large Language Model (LLM)-guided tree search to generate, evaluate, and optimize executable forecasting software .
Inside Baseball: The Automated Ball-Strike System as an Object Lesson in Technological Rule Enforcement
π‘ This research explores techniques in computer vision.
Major League Baseball's seven-year experimentation with the Automated Ball-Strike System (ABS) shows how even seemingly straightforward rules require a complex translation process to operationalize via technological systems . ABS is envisioned to call balls and strikes accurately: a seemingly straightforward use of technology to objectively determine the distance between a pitch and the strike zone .
An Algebraic Exposition of the Theory of Dyadic Morality
π‘ This research explores techniques in machine learning.
This paper provides an algebraic exposition of the theory of dyadic morality (TDM) We formalize TDM using structural causal modeling (SCM) notation . This algebraic formalization enables neurosymbolic AI systems to compute morality in a way that is both mathematically rigorous and faithful to human moral cognition .
Entropy Across the Bridge: Conditional-Marginal Discretization for Flow and SchrΓΆdinger Samplers
π‘ This research faster predictions in machine learning.
Flow matching and SchrΓΆdinger bridges define probability paths, yet their inference grids are usually heuristic or inherited from one-endpoint diffusion . We derive a conditional-marginal entropy-rate objective for bridge-aware discretization . We use it to build a training-free entropic inference-time scheduler from first principles .
ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents
π‘ This research explores techniques in machine learning.
The field lacks a scalable way to construct evaluation settings that are realistic, diverse, controllable, inspectable, and reproducible . We introduce ShopGym, an integrated framework for realistic simulation and scalable benchmarking of e-commerce web agents . We validate the framework through graph-based structural analysis and agent-based behavioral evaluation .
Sign-Separated Finite-Time Error Analysis of Q-Learning
π‘ This research presents techniques for language AI.
This paper develops a sign-separated finite-time error analysis for constant step-size Q-learning . The analysis identifies a max-induced asymmetry in error dynamics . Negative errors admit an optimal-policy lower comparison, the authors say .
Multi-level Self-supervised Pretraining on Compositional Hierarchical Graph for Molecular Property Prediction
π‘ This research running AI locally on devices for computer vision.
Self-supervised pretraining on molecular graphs has emerged as a promising approach for molecular property prediction . Most existing methods operate at a single structural granularity and treat bond information as auxiliary edge attributes rather than as an independent semantic layer . We propose MolCHG, a multi-level self-supervisory pretraining framework built upon a novel Compositional Hierarchical Graph .