TAILIEUCHUNG - Báo cáo khoa học: "Collective Generation of Natural Image Descriptions"

We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web. More specifically, given a query image, we retrieve existing human-composed phrases used to describe visually similar images, then selectively combine those phrases to generate a novel description for the query image. | Collective Generation of Natural Image Descriptions Polina Kuznetsova Vicente Ordonez Alexander C. Berg Tamara L. Berg and Yejin Choi Department of Computer Science Stony Brook University Stony Brook NY 11794-4400 pkuznetsova vordonezroma aberg tlberg ychoi @ Abstract We present a holistic data-driven approach to image description generation exploiting the vast amount of noisy parallel image data and associated natural language descriptions available on the web. More specifically given a query image we retrieve existing human-composed phrases used to describe visually similar images then selectively combine those phrases to generate a novel description for the query image. We cast the generation process as constraint optimization problems collectively incorporating multiple interconnected aspects of language composition for content planning surface realization and discourse structure. Evaluation by human annotators indicates that our final system generates more semantically correct and linguistically appealing descriptions than two nontrivial baselines. 1 Introduction Automatically describing images in natural language is an intriguing but complex AI task requiring accurate computational visual recognition comprehensive world knowledge and natural language generation. Some past research has simplified the general image description goal by assuming that relevant text for an image is provided . Aker and Gaizauskas 2010 Feng and Lapata 2010 . This allows descriptions to be generated using effective summarization techniques with relatively surface level image understanding. However such text . news articles 359 or encyclopedic text is often only loosely related to an image s specific content and many natural images do not come with associated text for summarization. In contrast other recent work has focused more on the visual recognition aspect by detecting content elements . scenes objects attributes actions etc and then composing .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.