TAILIEUCHUNG - Báo cáo khoa học: "Linguistic Profiling for Author Recognition and Verification"

A new technique is introduced, linguistic profiling, in which large numbers of counts of linguistic features are used as a text profile, which can then be compared to average profiles for groups of texts. The technique proves to be quite effective for authorship verification and recognition. The best parameter settings yield a False Accept Rate of at a False Reject Rate equal to zero for the verification task on a test corpus of student essays, and a 2-way recognition accuracy on the same corpus. . | Linguistic Profiling for Author Recognition and Verification Hans van Halteren Language and Speech Univ. of Nijmegen . Box 9103 NL-6500 HD Nijmegen The Netherlands hvh@ Abstract A new technique is introduced linguistic profiling in which large numbers of counts of linguistic features are used as a text profile which can then be compared to average profiles for groups of texts. The technique proves to be quite effective for authorship verification and recognition. The best parameter settings yield a False Accept Rate of at a False Reject Rate equal to zero for the verification task on a test corpus of student essays and a 2-way recognition accuracy on the same corpus. 1 Introduction There are several situations in language research or language engineering where we are in need of a specific type of extra-linguistic information about a text document and we would like to determine this information on the basis of linguistic properties of the text. Examples are the determination of the language variety or genre of a text or a classification for document routing or information retrieval. For each of these applications techniques have been developed focusing on specific aspects of the text often based on frequency counts of functions words in linguistics and of content words in language engineering. In the technique we are introducing in this paper linguistic profiling we make no a priori choice for a specific type of word or more complex feature to be counted. Instead all possible features are included and it is determined by the statistics for the texts under consideration and the distinction to be made how much weight if any each feature is to receive. Furthermore the frequency counts are not used as absolute values but rather as deviations from a norm which is again determined by the situation at hand. Our hypothesis is that this technique can bring a useful contribution to all tasks where it is necessary to distinguish one group of texts from .

Phương Thanh 48 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Linguistic Profiling for Author Recognition and Verification"

8 34 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462365 61

Giới thiệu :Lập trình mã nguồn mở

14 26893 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11381 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10576 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9858 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8911 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8527 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8111 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 8005 1826

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7302 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 242 3 13-01-2025

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 152 2 13-01-2025

Bảng màu theo chữ cái – V

11 177 2 13-01-2025

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 246 8 13-01-2025

Báo cáo " Thẩm quyền quản lí nhà nước đối với hoạt động quảng cáo thực trạng và hướng hoàn thiện "

7 217 7 13-01-2025

Lịch sử Trung Quốc 5000 năm tập 3 part 2

54 158 1 13-01-2025

TRẮC NGHIỆM - CÁC BỆNH THIẾU DINH DƯỠNG THƯỜNG GẶP

32 221 2 13-01-2025

CÂU HỎI TRẮC NGHIỆM HSLS NƯỚC TIỂU

9 180 0 13-01-2025

ĐỀ LUYỆN THI ĐẠI HỌC MÔN: TIẾNG ANH - SỐ 3

4 137 1 13-01-2025

Báo cáo khoa học: "Tongue carcinoma in an adult Down's syndrome patient: a case report"

4 138 0 13-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8111 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 8005 1826

Ebook Chào con ba mẹ đã sẵn sàng

112 4443 1376

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6396 1280

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8911 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3864 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3931 610

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4793 567

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11381 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4540 490