物體檢測

透過使用喺COCO數據集度訓練嘅YOLOv3模型，使用OpenCV嘅深度神經網絡模塊（dnn）檢測到嘅對象，呢個模型能夠檢測80個常見類別嘅對象。

物體檢測係一種跟電腦視覺同圖像處理有關嘅計算機技術，用於檢測數字圖像同影片中特定類別嘅語義對象（例如人，建築物或汽車）嘅實例。 ^[1]精心研究嘅對象檢測領域包括人面檢測同行人檢測。對象檢測喺計算機視覺嘅許多領域都有應用，包括圖像檢索同視頻監視。

用途

物體檢測廣泛用於電腦視覺任務，例如圖像註釋^[2]，行爲識別^[3]，人面檢測，人面識別，影片物體佮埋分割。物體檢測亦都用喺跟踪對象，例如喺足球比賽當中跟踪球，跟踪板球拍嘅運動或跟踪視頻入便嘅人。

概念

每個對象類都有自己嘅特徵，有助於對個類進行分類。例如，所有圓都係圓形嘅。對像類檢測用到啲特徵。例如，搵緊圓嗰陣，搵啲離點（即中心）有特定距離嘅對象。同樣，搵緊正方形嗰陣，需要啲搵邊角垂直邊長又相等嘅對象。類似嘅方法用於人面識別，可以搵對眼、隻鼻同埋嘴唇，又可以搵膚色同眼距等特徵。

方法

用於對象檢測嘅方法通常分為基於機械學習嘅方法或基於深度學習嘅方法。對於機器學習方法，有必要首先使以下方法之一定義特徵，然之後使諸如支援向量機（SVM）嘅技術進行分類。另一方面，深度學習技術唔使特定定義好嘅特徵就端到端噉檢測到對象，好多時基於卷積神經網絡（CNN）。

定義特徵抑或劃分區域嘅方法：

機器學習方法：
- 基於Haar特徵嘅Viola–Jones對象檢測框架
- 尺度唔變特徵變換（SIFT）
- 定向梯度直方圖（HOG）特徵^[5]
深度學習方法：
- 區域提議（Region Proposals，用於R-CNN^[6]，快速R-CNN^[7]，快速R-CNN^[8]，級聯R-CNN ）
- 單發MultiBox檢測器（Single Shot MultiBox Detector，SSD） ^[9]
- 你衹睇一次（YOLO） ^[10] ^[11] ^[12] ^[4]
- 用於目標檢測嘅單發細化神經網絡（RefineDet） ^[13]
- Retina-Net^[14] ^[15]
- 變形得嘅卷積網絡（Deformable convolutional networks）^[16] ^[17]

睇埋

攷

↑ Dasiopoulou, Stamatia, et al. "Knowledge-assisted semantic video object detection." IEEE Transactions on Circuits and Systems for Video Technology 15.10 (2005): 1210–1224.
↑ Ling Guan; Yifeng He; Sun-Yuan Kung (1 March 2012). Multimedia Image and Video Processing. CRC Press. pp. 331–. ISBN 978-1-4398-3087-1.
↑ Wu, Jianxin, et al. "A scalable approach to activity recognition based on object use." 2007 IEEE 11th international conference on computer vision. IEEE, 2007.
↑ ^4.0 ^4.1 A bot will complete this citation soon. Click here to jump the queue arXiv:[1]. 引用錯誤 Invalid <ref> tag; name "yolov4" defined multiple times with different content
↑ Dalal, Navneet (2005). "Histograms of oriented gradients for human detection" (PDF). Computer Vision and Pattern Recognition. 1.
↑ Ross, Girshick (2014). "Rich feature hierarchies for accurate object detection and semantic segmentation" (PDF). Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE: 580–587. arXiv:1311.2524. doi:10.1109/CVPR.2014.81. ISBN 978-1-4799-5118-5.
↑ Girschick, Ross (2015). "Fast R-CNN" (PDF). Proceedings of the IEEE International Conference on Computer Vision: 1440–1448. arXiv:1504.08083. Bibcode:2015arXiv150408083G.
↑ Shaoqing, Ren (2015). "Faster R-CNN". Advances in Neural Information Processing Systems. arXiv:1506.01497.
↑ Liu, Wei (October 2016). "SSD: Single shot multibox detector". Computer Vision – ECCV 2016. European Conference on Computer Vision. Lecture Notes in Computer Science.第9905卷. pp. 21–37. arXiv:1512.02325. doi:10.1007/978-3-319-46448-0_2. ISBN 978-3-319-46447-3.
↑ Redmon, Joseph (2016). "You only look once: Unified, real-time object detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1506.02640. Bibcode:2015arXiv150602640R.
↑ Redmon. "YOLO9000: better, faster, stronger". {{cite arxiv}}: |arxiv= required (help)
↑ Redmon. "Yolov3: An incremental improvement". {{cite arxiv}}: |arxiv= required (help)
↑ Zhang, Shifeng (2018). "Single-Shot Refinement Neural Network for Object Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 4203–4212. arXiv:1711.06897. Bibcode:2017arXiv171106897Z.
↑ Lin, Tsung-Yi (2020). "Focal Loss for Dense Object Detection". IEEE Transactions on Pattern Analysis and Machine Intelligence. 42 (2): 318–327. arXiv:1708.02002. Bibcode:2017arXiv170802002L. doi:10.1109/TPAMI.2018.2858826. PMID 30040631.
↑ Pang. "Libra R-CNN: Towards Balanced Learning for Object Detection". {{cite arxiv}}: |arxiv= required (help)
↑ Zhu. "Deformable ConvNets v2: More Deformable, Better Results". {{cite arxiv}}: |arxiv= required (help)
↑ Dai. "Deformable Convolutional Networks". {{cite arxiv}}: |arxiv= required (help)

"Object Class Detection". Vision.eecs.ucf.edu. 原著喺2013-07-14歸檔. 喺2013-10-09搵到.
"ETHZ – Computer Vision Lab: Publications". Vision.ee.ethz.ch. 原著喺2013-06-03歸檔. 喺2013-10-09搵到.

連出去

[1] Dasiopoulou, Stamatia, et al. "Knowledge-assisted semantic video object detection." IEEE Transactions on Circuits and Systems for Video Technology 15.10 (2005): 1210–1224.

[GuanHe2012-2] Ling Guan; Yifeng He; Sun-Yuan Kung (1 March 2012). Multimedia Image and Video Processing. CRC Press. pp. 331–. ISBN 978-1-4398-3087-1.

[3] Wu, Jianxin, et al. "A scalable approach to activity recognition based on object use." 2007 IEEE 11th international conference on computer vision. IEEE, 2007.

[yolov4-4] 4.0 ^4.1 A bot will complete this citation soon. Click here to jump the queue arXiv:[1]. 引用錯誤 Invalid <ref> tag; name "yolov4" defined multiple times with different content

[5] Dalal, Navneet (2005). "Histograms of oriented gradients for human detection" (PDF). Computer Vision and Pattern Recognition. 1.

[6] Ross, Girshick (2014). "Rich feature hierarchies for accurate object detection and semantic segmentation" (PDF). Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE: 580–587. arXiv:1311.2524. doi:10.1109/CVPR.2014.81. ISBN 978-1-4799-5118-5.

[7] Girschick, Ross (2015). "Fast R-CNN" (PDF). Proceedings of the IEEE International Conference on Computer Vision: 1440–1448. arXiv:1504.08083. Bibcode:2015arXiv150408083G.

[8] Shaoqing, Ren (2015). "Faster R-CNN". Advances in Neural Information Processing Systems. arXiv:1506.01497.

[9] Liu, Wei (October 2016). "SSD: Single shot multibox detector". Computer Vision – ECCV 2016. European Conference on Computer Vision. Lecture Notes in Computer Science.第9905卷. pp. 21–37. arXiv:1512.02325. doi:10.1007/978-3-319-46448-0_2. ISBN 978-3-319-46447-3.

[10] Redmon, Joseph (2016). "You only look once: Unified, real-time object detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1506.02640. Bibcode:2015arXiv150602640R.

[11] Redmon. "YOLO9000: better, faster, stronger". {{cite arxiv}}: |arxiv= required (help)

[12] Redmon. "Yolov3: An incremental improvement". {{cite arxiv}}: |arxiv= required (help)

[13] Zhang, Shifeng (2018). "Single-Shot Refinement Neural Network for Object Detection". Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 4203–4212. arXiv:1711.06897. Bibcode:2017arXiv171106897Z.

[14] Lin, Tsung-Yi (2020). "Focal Loss for Dense Object Detection". IEEE Transactions on Pattern Analysis and Machine Intelligence. 42 (2): 318–327. arXiv:1708.02002. Bibcode:2017arXiv170802002L. doi:10.1109/TPAMI.2018.2858826. PMID 30040631.

[Pang_Chen_Shi_Feng_2019-15] Pang. "Libra R-CNN: Towards Balanced Learning for Object Detection". {{cite arxiv}}: |arxiv= required (help)

[16] Zhu. "Deformable ConvNets v2: More Deformable, Better Results". {{cite arxiv}}: |arxiv= required (help)

[17] Dai. "Deformable Convolutional Networks". {{cite arxiv}}: |arxiv= required (help)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]