Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Trong khi kịch bản này là một chút phức tạp hơn, nó vẫn thực sự đơn giản. Trong kịch bản này, chúng tôi đã đặt "" chuỗi thành một mã thông báo, bởi vì chúng ta sẽ sử dụng nó nhiều hơn once.This cũng làm cho nó dễ dàng thay đổi khi Google quyết định gọi nó là cái gì khác. | Google s Part in an Information Collection Framework Chapter 5 181 6 my end 7 my token div class g 8 9 while 1 10 start index result token start 11 end index result token start 1 12 if start -1 end -1 start end 13 last 14 15 16 my snippet substr result start end- start 17 print n----- n . snippet. n----- n 18 start end 19 While this script is a little more complex it s still really simple. In this script we ve put the div class g string into a token because we are going to use it more than once.This also makes it easy to change when Google decides to call it something else. In lines 9 through 19 a loop is constructed that will continue to look for the existence of the token until it is not found anymore. If it does not find a token line 12 then the loop simply exists. In line 18 we move the position from where we are starting our search for the token to the position where we ended up in our previous search. Running this script results in the different HTML snippets being sent to standard output. But this is only so useful. What we really want is to extract the URL the title and the summary from the snippet. For this we need a function that will accept four parameters a string that contains a starting token a string that contains the ending token a scalar that will say where to search from and a string that contains the HTML that we want to search within. We want this function to return the section that was extracted as well as the new position where we are within the passed string. Such a function looks like this 1 sub cutter 2 my starttok endtok where str @_ 3 my startcut index str starttok where length starttok 4 my endcut index str endtok startcut 1 5 my returner substr str startcut endcut- startcut 6 my ares 7 push ares endcut 8 push ares returner 9 return ares 10 182 Chapter 5 Google s Part in an Information Collection Framework Now that we have this function we can inspect the HTML and decide how to extract the URL the summary and the title from each snippet.