pycharm爬取網頁數據，python爬取網頁圖片詳解

2023-12-09 阅读 39 评论 0

摘要：文章目錄什么是爬蟲爬取網頁圖片實現步驟第一步：打開所操作的網站（任意一個網站）第二步：通過python訪問這個網站第三步：點擊F12查詢相關信息第四步：爬取圖片，下載到本地第五步：顯示測試核心代碼什么是爬蟲網絡爬蟲

文章目錄

什么是爬蟲
爬取網頁圖片實現步驟
- 第一步：打開所操作的網站（任意一個網站）
- 第二步：通過python訪問這個網站
- 第三步：點擊F12查詢相關信息
- 第四步：爬取圖片，下載到本地
- 第五步：顯示測試
核心代碼

什么是爬蟲

網絡爬蟲（又被稱為網頁蜘蛛，網絡機器人）就是模擬瀏覽器發送網絡請求，接收請求響應，一種按照一定的規則，自動地抓取互聯網信息的程序。
原則上,只要是瀏覽器(客戶端)能做的事情，爬蟲都能夠做。

爬取網頁圖片實現步驟

第一步：打開所操作的網站（任意一個網站）

http://github.com/

第二步：通過python訪問這個網站

headers = {'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
response = requests.get('http://github.com/',headers=headers)
print(response.request.headers)

pycharm爬取網頁數據，在這里插入圖片描述

{'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

第三步：點擊F12查詢相關信息

在這里插入圖片描述
查找到圖片信息

獲取headers：

第四步：爬取圖片，下載到本地

headers = {'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
response = requests.get('https://avatars.githubusercontent.com/nplasterer?s=64&v=4',headers=headers)
print(response.request.headers)
with open('icon.ico', 'wb') as f:f.write(response.content)print("爬取圖片成功")

第五步：顯示測試

img = cv2.imread("icon.ico")
cv2.imshow('icon',img)
cv2.waitKey(0)

在這里插入圖片描述

核心代碼

import requests
import  cv2headers = {'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
response = requests.get('https://avatars.githubusercontent.com/nplasterer?s=64&v=4',headers=headers)
print(response.request.headers)
with open('icon.ico', 'wb') as f:f.write(response.content)print("爬取圖片成功")
img = cv2.imread("icon.ico")
cv2.imshow('icon',img)
cv2.waitKey(0)