一个基于selenium的快速编写爬虫脚本的爬虫框架-beryllium

Table of Contents

前言

之前在做智慧旅游的爬虫项目的时候,遇到了js动态加载网页内容的问题。导致必须适应selenium来解决问题,但是使用selenium太麻烦了。因此,开发了一个爬虫库,来快速编写爬虫脚本。

beryllium demo

# -*- coding: utf-8 -*-

from beryllium import Beryllium
import time
from beryllium import FieldList, Field, FieldName, Page, ListCssSelector, Mongodb, NextPageCssSelectorSetup, PageFunc
bery = Beryllium()
bery.driver = bery.get_driver()
bery.fast_get_page("https://www.baidu.com")
time.sleep(1)
bery.until_send_text_by_css_selector(css_selector="#kw", text="杭州")
bery.until_send_enter_by_css_selector(css_selector="#kw")
time.sleep(2)

fieldlist_shop = FieldList(
    Field(field_name=FieldName.SHOP_NAME, css_selector="h3"),
)
page_shop = Page(name="shop_page",
                 field_list=fieldlist_shop,
                 list_css_selector=ListCssSelector(list_css_selector="#content_left > div.result.c-container"))

bery.until_click_no_next_page_by_css_selector(
    next_page_setup=NextPageCssSelectorSetup(
        page=page_shop,
        css_selector="#page > a.n",
        main_page_func=PageFunc(func=bery.from_page_get_data_list, page=page_shop)
    )
)

演示