TAILIEUCHUNG - Báo cáo khoa học: "a Toolkit for Distributed Perceptron Training and Prediction with MapReduce"

We propose a set of open-source software modules to perform structured Perceptron Training, Prediction and Evaluation within the Hadoop framework. Apache Hadoop is a freely available environment for running distributed applications on a computer cluster. The software is designed within the Map-Reduce paradigm. Thanks to distributed computing, the proposed software reduces substantially execution times while handling huge data-sets. The distributed Perceptron training algorithm preserves convergence properties, thus guaranties same accuracy performances as the serial Perceptron. . | HadoopPerceptron a Toolkit for Distributed Perceptron Training and Prediction with MapReduce Andrea Gesmundo Computer Science Department University of Geneva Geneva Switzerland Nadi Tomeh LIMSI-CNRS and Universite Paris-Sud Orsay France Abstract We propose a set of open-source software modules to perform structured Perceptron Training Prediction and Evaluation within the Hadoop framework. Apache Hadoop is a freely available environment for running distributed applications on a computer cluster. The software is designed within the Map-Reduce paradigm. Thanks to distributed computing the proposed software reduces substantially execution times while handling huge data-sets. The distributed Perceptron training algorithm preserves convergence properties thus guaranties same accuracy performances as the serial Perceptron. The presented modules can be executed as stand-alone software or easily extended or integrated in complex systems. The execution of the modules applied to specific NLP tasks can be demonstrated and tested via an interactive web interface that allows the user to inspect the status and structure of the cluster and interact with the MapReduce jobs. 1 Introduction The Perceptron training algorithm Rosenblatt 1958 Freund and Schapire 1999 Collins 2002 is widely applied in the Natural Language Processing community for learning complex structured models. The non-probabilistic nature of the perceptron parameters makes it possible to incorporate arbitrary features without the need to calculate a partition function which is required for its discriminative probabilistic counterparts such as CRFs Lafferty et al. 2001 . Additionally the Perceptron is robust to approximate inference in large search spaces. Nevertheless Perceptron training is proportional to inference which is frequently non-linear in the input sequence size. Therefore training can be time-consuming for complex model structures. Furthermore for an .

TỪ KHÓA LIÊN QUAN
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.