
Requests-HTML β ΠΌΠ°ΠΊΡΠΈΠΌΠ°Π»ΡΠ½ΠΎ ΠΏΡΠΎΡΡΠ°Ρ ΠΈ ΠΈΠ½ΡΡΠΈΡΠΈΠ²Π½ΠΎ ΠΏΠΎΠ½ΡΡΠ½Π°Ρ Π±ΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠ° Π΄Π»Ρ ΠΏΠ°ΡΡΠΈΠ½Π³Π° html Π²ΠΊΠ»ΡΡΠ°Ρ Π°ΡΠΈΠ½Ρ ΡΠΎΠ½Π½ΡΠΉ ΠΏΠ°ΡΡΠΈΠ½Π³.
Π£ΡΡΠ°Π½ΠΎΠ²ΠΊΠ°:
$ pip install requests-html
ΠΡΠΈΠΌΠ΅Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΡ:
1οΈβ£
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://python.org/')
2οΈβ£
from requests_html import AsyncHTMLSession
asession = AsyncHTMLSession()
async def get_pythonorg():
r = await asession.get('https://python.org/')
return r
async def get_reddit():
r = await asession.get('https://reddit.com/')
return r
async def get_google():
r = await asession.get('https://google.com/')
return r
results = asession.run(get_pythonorg, get_reddit, get_google)
results # check the requests all returned a 200 (success) code
[<Response [200]>, <Response [200]>, <Response [200]>]
for result in results:
print(result.html.url)
ΠΠ· ΠΊΠΎΡΠΎΠ±ΠΊΠΈ ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΈΠ²Π°Π΅Ρ ΡΠΎΡ ΡΠ°Π½Π΅Π½ΠΈΠ΅ cookie ΡΠ°ΠΉΠ»ΠΎΠ², ΠΈΠΌΠΈΡΠΈΡΡΠ΅Ρ user-agent, Π°ΡΠΈΠ½Ρ ΡΠΎΠ½Π½ΡΠ΅ Π·Π°ΠΏΡΠΎΡΡ, JavaScript ΠΈ Π΄.Ρ. ΠΏΠ»ΡΡΠΊΠΈ.
#python #github #soft