TAILIEUCHUNG - Báo cáo khoa học: "Generating Image Descriptions From Computer Vision Detections"

This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics, the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation system outperforms state-of-the-art systems, automatically generating some of the most natural image descriptions to date. . | Midge Generating Image Descriptions From Computer Vision Detections Margaret Mitchelứ Jesse Dodge Amit Goyaltt Kota Yamaguchi Karl Stratos Xufeng Han Alyssa Mensch Alex Berg Tamara Berg Hal Daume Illtt U. of Aberdeen and Oregon Health and Science University Stony Brook University aberg tlberg xufhan kyamagu @ U. of Maryland hal amit @ H Columbia University stratos@ U. of Washington dodgejesse@ MIT acmensch@ Abstract This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation system outperforms state-of-the-art systems automatically generating some of the most natural image descriptions to date. 1 Introduction It is becoming a real possibility for intelligent systems to talk about the visual world. New ways of mapping computer vision to generated language have emerged in the past few years with a focus on pairing detections in an image to words Farhadi et al. 2010 Li et al. 2011 Kulkarni et al. 2011 Yang et al. 2011 . The goal in connecting vision to language has varied systems have started producing language that is descriptive and poetic Li et al. 2011 summaries that add content where the computer vision system does not Yang et al. 2011 and captions copied directly from other images that are globally Farhadi et al. 2010 and locally similar Ordonez et al. 2011 . A commonality between all of these approaches is that they aim to produce naturalsounding descriptions from computer vision detections. This commonality is our starting point We aim to design a system capable of producing natural-sounding descriptions from computer vision detections that are .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.