Xi'an Institute of Optics and Fine Mechanics, Chinese Academy of Sciences has made new progress in the field of zero sample anomaly detection-Xingwangbao Machinery Equipment Network

Recently, Wang Quan, a research team from the Spectral Imaging Technology Research Office of Xi'an Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, made new progress in the direction of zero sample anomaly detection and location in the field of computer vision, and the relevant achievements were received at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026). The first author of the paper is Hu Ming, a master's student from Xi'an Institute of Optics and Fine Mechanics in 2024. The corresponding authors are Dr. Hu Cong from Central South Hospital of Wuhan University, Researcher Hu Bingliang and Researcher Wang Quan from Xi'an Institute of Optics and Fine Mechanics. Xi'an Institute of Optics and Fine Mechanics is the first communication unit.

With the increasing demand for applications such as industrial quality inspection and medical imaging analysis, anomaly detection technology is receiving increasing attention. However, in practical scenarios, abnormal samples are often scarce or even difficult to obtain, and traditional supervised learning methods that rely on annotated data face bottlenecks.

The zero sample anomaly detection method based on visual language model relies on large-scale pre training knowledge to achieve detection without anomaly labeling. However, in fine-grained anomaly detection tasks, this method still faces three major challenges: firstly, the model is difficult to distinguish foreground targets from complex backgrounds, and anomaly features are easily mixed with backgrounds, which affects detection accuracy; Secondly, relying on a single text representation has limited semantic expression ability, making it difficult to provide precise basis for anomaly discrimination; Thirdly, in the process of cross modal alignment, there is uncertainty in the semantic matching between images and text, which restricts the improvement of model performance.

In response to the above issues, the research team proposed a new framework - FB-CLIP (Foreground Background Dissented CLIP). This framework innovates from three levels:

In text modeling, a multi strategy text feature fusion method is proposed, which combines sentence level representation, global context information, and attention weighted features to construct a richer task aware semantic representation and improve the model's understanding ability of abnormal semantics;

In visual modeling, a multi perspective foreground background separation mechanism is designed to decouple image features from semantic, spatial, structural, and other dimensions. Background suppression strategies are used to reduce interference information in complex scenes, enabling the model to more accurately focus on abnormal areas;

Introducing semantic consistency regularization constraints in cross modal alignment enhances the model's ability to discriminate anomalies by increasing prediction confidence and widening the semantic gap between normal and abnormal samples.

The experimental results show that FB-CLIP has achieved excellent performance on multiple industrial detection and medical imaging datasets, especially in fine-grained anomaly localization tasks, and the overall performance has reached the international leading level. This method can achieve accurate detection and localization of small anomalies in complex scenes without the need for abnormal sample labeling, and has good practical application prospects.

This achievement is expected to be applied in fields such as medical imaging assisted diagnosis and industrial defect detection.

The research team led by Wang Quan from Xi'an Institute of Optics and Fine Mechanics has long been deeply engaged in interdisciplinary research in computer vision, biomedical imaging, and brain computer intelligence. In recent years, they have made a series of important progress in related fields, and their related achievements have been published in CVPR 2025, Pattern Recognition, and other publications.

The IEEE/CVF Conference on Computer Vision and Pattern Recognition is one of the most influential international academic conferences in the field of computer vision, and has been rated as an A-class conference by the Chinese Computer Society (CCF).