2025-07-02 |
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation |
Sixiang Chen et.al. |
2507.01961v2 |
2025-07-03 |
null |
2025-07-02 |
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks |
Rahul Ramachandran et.al. |
2507.01955v1 |
2025-07-02 |
null |
2025-07-02 |
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model |
Yukang Cao et.al. |
2507.01953v1 |
2025-07-02 |
null |
2025-07-02 |
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory |
Nan Chen et.al. |
2507.01945v1 |
2025-07-02 |
null |
2025-07-02 |
Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations |
Wenhao Wang et.al. |
2507.01930v2 |
2025-07-03 |
null |
2025-07-02 |
3D Reconstruction and Information Fusion between Dormant and Canopy Seasons in Commercial Orchards Using Deep Learning and Fast GICP |
Ranjan Sapkota et.al. |
2507.01912v1 |
2025-07-02 |
null |
2025-07-02 |
Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning |
Qingdong He et.al. |
2507.01908v1 |
2025-07-02 |
null |
2025-07-02 |
An in-silico lung phantom to assess the performance of pulmonary artery segmentation using angiogram |
Sunder Neelakantan et.al. |
2507.01867v1 |
2025-07-02 |
null |
2025-07-02 |
Refining Gelfond Rationality Principle Towards More Comprehensive Foundational Principles for Answer Set Semantics |
Yi-Dong Shen et.al. |
2507.01833v1 |
2025-07-02 |
null |
2025-07-02 |
Autoadaptive Medical Segment Anything Model |
Tyler Ward et.al. |
2507.01828v1 |
2025-07-02 |
null |
2025-07-02 |
Boosting Adversarial Transferability Against Defenses via Multi-Scale Transformation |
Zihong Guo et.al. |
2507.01791v1 |
2025-07-02 |
null |
2025-07-02 |
Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging |
Montasir Shams et.al. |
2507.01788v1 |
2025-07-02 |
null |
2025-07-02 |
A Deterministic Partition Tree and Applications |
Haitao Wang et.al. |
2507.01775v1 |
2025-07-02 |
null |
2025-07-02 |
Calibrated Self-supervised Vision Transformers Improve Intracranial Arterial Calcification Segmentation from Clinical CT Head Scans |
Benjamin Jin et.al. |
2507.01744v1 |
2025-07-02 |
null |
2025-07-02 |
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy |
Ming Dai et.al. |
2507.01738v1 |
2025-07-02 |
null |
2025-07-02 |
Soft Self-labeling and Potts Relaxations for Weakly-Supervised Segmentation |
Zhongwen Zhang et.al. |
2507.01721v1 |
2025-07-02 |
null |
2025-07-02 |
Facial Emotion Learning with Text-Guided Multiview Fusion via Vision-Language Model for 3D/4D Facial Expression Recognition |
Muzammil Behzad et.al. |
2507.01673v1 |
2025-07-02 |
null |
2025-07-02 |
SAILViT: Towards Robust and Generalizable Visual Backbones for MLLMs via Gradual Feature Refinement |
Weijie Yin et.al. |
2507.01643v1 |
2025-07-02 |
null |
2025-07-02 |
Depth Anything at Any Condition |
Boyuan Sun et.al. |
2507.01634v1 |
2025-07-02 |
null |
2025-07-02 |
Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation |
Camille Billouard et.al. |
2507.01631v1 |
2025-07-02 |
null |
2025-07-02 |
Prompt Guidance and Human Proximal Perception for HOT Prediction with Regional Joint Loss |
Yuxiao Wang et.al. |
2507.01630v1 |
2025-07-02 |
null |
2025-07-02 |
Perception-Oriented Latent Coding for High-Performance Compressed Domain Semantic Inference |
Xu Zhang et.al. |
2507.01608v1 |
2025-07-02 |
null |
2025-07-02 |
Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems |
Zhaoyan Sun et.al. |
2507.01599v1 |
2025-07-02 |
null |
2025-07-02 |
Vision-Aided ISAC in Low-Altitude Economy Networks via De-Diffused Visual Priors |
Yulan Gao et.al. |
2507.01574v1 |
2025-07-02 |
null |
2025-07-02 |
A Gift from the Integration of Discriminative and Diffusion-based Generative Learning: Boundary Refinement Remote Sensing Semantic Segmentation |
Hao Wang et.al. |
2507.01573v1 |
2025-07-02 |
null |
2025-07-02 |
Real-Time Emergency Vehicle Siren Detection with Efficient CNNs on Embedded Hardware |
Marco Giordano et.al. |
2507.01563v1 |
2025-07-02 |
null |
2025-07-02 |
Mamba Guided Boundary Prior Matters: A New Perspective for Generalized Polyp Segmentation |
Tapas K. Dutta et.al. |
2507.01509v1 |
2025-07-02 |
null |
2025-07-02 |
Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence |
Robert Aufschläger et.al. |
2507.01504v1 |
2025-07-02 |
null |
2025-07-02 |
Integrating Traditional and Deep Learning Methods to Detect Tree Crowns in Satellite Images |
Ozan Durgut et.al. |
2507.01502v1 |
2025-07-02 |
null |
2025-07-02 |
Representation Entanglement for Generation:Training Diffusion Transformers Is Much Easier Than You Think |
Ge Wu et.al. |
2507.01467v1 |
2025-07-02 |
null |