一个基于selenium的快速编写爬虫脚本的爬虫框架-beryllium
Table of Contents
前言
之前在做智慧旅游的爬虫项目的时候,遇到了js动态加载网页内容的问题。导致必须适应selenium来解决问题,但是使用selenium太麻烦了。因此,开发了一个爬虫库,来快速编写爬虫脚本。
beryllium demo
# -*- coding: utf-8 -*-
from beryllium import Beryllium
import time
from beryllium import FieldList, Field, FieldName, Page, ListCssSelector, Mongodb, NextPageCssSelectorSetup, PageFunc
bery = Beryllium()
bery.driver = bery.get_driver()
bery.fast_get_page("https://www.baidu.com")
time.sleep(1)
bery.until_send_text_by_css_selector(css_selector="#kw", text="杭州")
bery.until_send_enter_by_css_selector(css_selector="#kw")
time.sleep(2)
fieldlist_shop = FieldList(
Field(field_name=FieldName.SHOP_NAME, css_selector="h3"),
)
page_shop = Page(name="shop_page",
field_list=fieldlist_shop,
list_css_selector=ListCssSelector(list_css_selector="#content_left > div.result.c-container"))
bery.until_click_no_next_page_by_css_selector(
next_page_setup=NextPageCssSelectorSetup(
page=page_shop,
css_selector="#page > a.n",
main_page_func=PageFunc(func=bery.from_page_get_data_list, page=page_shop)
)
)