TAILIEUCHUNG - Apress - Smart Home Automation with Linux (2010)- P42

Apress - Smart Home Automation with Linux (2010)- P42:Linux users can now control their homes remotely! Are you a Linux user who has ever wanted to turn on the lights in your house, or open and close the curtains, while away on holiday? Want to be able to play the same music in every room, controlled from your laptop or mobile phone? Do you want to do these things without an expensive off-the-shelf kit | CHAPTER 6 DATA SOURCES Once you are able to describe the location of the data in human terms you can start writing the code The process involves a mechanized agent that is able to load the web page and traverse links and a stream processor that skips over the HTML tags. You begin the scraping with a fairly common loading block like this usr bin perl -w use strict use WWW Mechanize use HTML TokeParser my agent WWW Mechanize- new agent- get http my stream HTML TokeParser- new agent- content Given the stream you can now skip to the fourth table for example by jumping over four of the opening table tags using the following foreach stream- get_tag table Notice that get_tag positions the stream point immediately after the opening tag given in this case table. Consequently the stream point is now inside the fourth table. Since our data is on the first row you don t need to worry about skipping the tr tag so you can jump straight into the second column with this stream- get_tag td stream- get_tag td since skipping the td tag will automatically skip the preceding tr. The stream is now positioned exactly where you want it. The HTML structure of this block is as follows a href url Main title a td td valign top Main story text So far I have been using get_tag to skip elements but it also sports a return value containing the contents of the tag. So you d retrieve the information from the anchor with the following which by its nature can return multiple tags my @link stream- get_tag a Since you know there is only one in this particular HTML it is link 0 that is of interest. Inside this is another array containing the following link 0 0 tag link o 1 attributes link o 2 attribute sequence link o 3 text 188 CHAPTER 6 DATA SOURCES Therefore you can extract the link information with the following my href link 0 1 href And since get_tag only retrieves the information about the tag you must return to the stream to extract all the data between this a and

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.