Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Tìm kiếm này nằm một địa chỉ e-mail, jg65_83@yahoo.com, nhưng cũng có khóa trên store.yahoo.com, đó không phải là một e-mail hợp lệ địa chỉ. Trong trường hợp như thế này, sự lựa chọn tốt nhất cho vị trí chuỗi cụ thể nằm trong việc sử dụng thường xuyên liên quan đến expressions.This tải các tài liệu bạn muốn tìm kiếm (mà bạn rất có thể tìm thấy với một tìm kiếm Google) và phân tích các tập tin cho các thông tin bạn for. | Document Grinding and Database Digging Chapter 4 151 This search located one e-mail address jg65_83@yahoo.com but also keyed on store.yahoo.com which is not a valid e-mail address. In cases like this the best option for locating specific strings lies in the use of regular expressions.This involves downloading the documents you want to search which you most likely found with a Google search and parsing those files for the information you re looking for.You could opt to automate the process of downloading these files as we ll show in Chapter 12 but once you have downloaded the files you ll need an easy way to search the files for interesting information. Consider the following Perl script usr bin perl Usage . ssearch.pl FILE_TO_SEARCH WORDLIST Locate words in a file coded by James Foster use strict open SEARCHFILE ARGV 0 die Can not open searchfile because open WORDFILE ARGV 1 die Can not open wordfile because my @WORDS WORDFILE close WORDFILE my LineCount 0 while SEARCHFILE foreach my word @WORDS chomp word LineCount if m word print n last close SEARCHFILE This script accepts two arguments a file to search and a list of words to search for. As it stands this program is rather simplistic acting as nothing more than a glorified grep script. However the script becomes much more powerful when instead of words the word list contains regular expressions. For example consider the following regular expression written by Don Ranta 152 Chapter 4 Document Grinding and Database Digging a-zA-Z0-9._- @ a-zA-Z0-9_- 2 99 . a-zA-Z 2 4 25 0-5 2 0-4 d 1 d d 1-9 d 1-9 . 25 0-5 2 0-4 d 1 d d 1-9 d 1-9 . 25 0-5 2 0-4 d 1 d d 1-9 d 1-9 . 25 0-5 2 0-4 d 1 d d 1-9 d 1-9 Unless you re somewhat skilled with regular expressions this might look like a bunch of garbage text. This regular expression is very powerful however and will locate various forms of e-mail address. Let s take a look at this regular expression in action. For this example we ll save the results of a Google Groups search for