TAILIEUCHUNG - Báo cáo khoa học: "a Toolkit for Distributed Perceptron Training and Prediction with MapReduce"

We propose a set of open-source software modules to perform structured Perceptron Training, Prediction and Evaluation within the Hadoop framework. Apache Hadoop is a freely available environment for running distributed applications on a computer cluster. The software is designed within the Map-Reduce paradigm. Thanks to distributed computing, the proposed software reduces substantially execution times while handling huge data-sets. The distributed Perceptron training algorithm preserves convergence properties, thus guaranties same accuracy performances as the serial Perceptron. . | HadoopPerceptron a Toolkit for Distributed Perceptron Training and Prediction with MapReduce Andrea Gesmundo Computer Science Department University of Geneva Geneva Switzerland Nadi Tomeh LIMSI-CNRS and Universite Paris-Sud Orsay France Abstract We propose a set of open-source software modules to perform structured Perceptron Training Prediction and Evaluation within the Hadoop framework. Apache Hadoop is a freely available environment for running distributed applications on a computer cluster. The software is designed within the Map-Reduce paradigm. Thanks to distributed computing the proposed software reduces substantially execution times while handling huge data-sets. The distributed Perceptron training algorithm preserves convergence properties thus guaranties same accuracy performances as the serial Perceptron. The presented modules can be executed as stand-alone software or easily extended or integrated in complex systems. The execution of the modules applied to specific NLP tasks can be demonstrated and tested via an interactive web interface that allows the user to inspect the status and structure of the cluster and interact with the MapReduce jobs. 1 Introduction The Perceptron training algorithm Rosenblatt 1958 Freund and Schapire 1999 Collins 2002 is widely applied in the Natural Language Processing community for learning complex structured models. The non-probabilistic nature of the perceptron parameters makes it possible to incorporate arbitrary features without the need to calculate a partition function which is required for its discriminative probabilistic counterparts such as CRFs Lafferty et al. 2001 . Additionally the Perceptron is robust to approximate inference in large search spaces. Nevertheless Perceptron training is proportional to inference which is frequently non-linear in the input sequence size. Therefore training can be time-consuming for complex model structures. Furthermore for an .

Quỳnh Nhi 66 5 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Phát triển ứng dụng J2ME với Samsung JaUmi Wireless Toolkit 2.0

4 42 0

Bài 4 Abstract Window Toolkit

36 40 0

Báo cáo khoa học: "An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation"

6 48 0

Báo cáo khoa học: "Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora"

6 74 0

Báo cáo khoa học: "An Open Source Toolkit for Tree/Forest-Based Statistical Machine Translation"

6 44 0

Báo cáo khoa học: "Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation"

4 55 0

Báo cáo khoa học: "A Modular Toolkit for Coreference Resolution"

4 61 0

Báo cáo khoa học: "Moses: Open Source Toolkit for Statistical Machine Translation"

4 50 0

Báo cáo khoa học: "The Natural Language Toolkit"

4 42 0

Age appropriate transition assessment toolkit

39 86 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461844 55

Giới thiệu :Lập trình mã nguồn mở

14 22503 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10855 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10024 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9479 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8198 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6639 253

Vật lý hạt cơ bản (1)

29 5752 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Đề tài: Ôn xác định vị trí trên – dưới, trước- sau của đối tượng khác.

8 348 3 19-04-2024

CẤU TẠO HẠT NHÂN NGUYÊN TỬ-ĐỘ HỤT KHỐI-NĂNG LƯỢNG LIÊN KẾT-LK RIÊNG

12 262 0 19-04-2024

Sẵn sàng cho thảm họa

9 219 0 19-04-2024

Anh văn bằng C-124

8 170 0 19-04-2024

MySQL Database Usage & Administration PHẦN 7

37 154 0 19-04-2024

MySQL Database Usage & Administration PHẦN 9

37 137 0 19-04-2024

Hướng dẫn sử dụng Quickoffice cho Ipad và Iphone

13 150 0 19-04-2024

Diseases of the Liver and Biliary System - part 1

33 120 0 19-04-2024

Data Structures and Algorithms - Chapter 8: Heaps

41 115 0 19-04-2024

Khóa luận tốt nghiệp: Giải pháp nâng cao chất lượng phương thức thanh toán tín dụng chứng từ phục vụ xuất nhập khẩu tại ngân hàng Thương mại Việt Nam - Trần Thị Tân

12 115 0 19-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5580 1320

Ebook Chào con ba mẹ đã sẵn sàng

112 3746 1228

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5241 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3471 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10855 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3668 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4015 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4092 478