python調用shell命令，利用Crontab + shell + python 每日更新小說

2023-10-06 阅读 24 评论 0

摘要：Linux 的 Crontab 可以靈活的定期（不定期）處理一系列的動作，本文章主要是利用其特性，進行每日的小說下載！ 1. Crontab 的設定：每日早上8:30執行 30 8 * * * source /home/topgrec/source/ddlbook.sh & 2. shell 的內容：

Linux 的 Crontab 可以靈活的定期（不定期）處理一系列的動作，本文章主要是利用其特性，進行每日的小說下載！

1. Crontab 的設定：

每日早上8:30執行

30 8 * * *  source /home/topgrec/source/ddlbook.sh &

2. shell 的內容：

#! /bin/bash
python /home/topgrec/source/python/downloadbooks.py

3. Python 內容（`downloadbooks.py`）：

來源網站：筆趣網
書名：透視醫圣豪婿

import os
import re
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'
}def getdetails(info):titlename = info.textlinkurl = rooturl + info.get('href')bookcontent = requests.get(linkurl, headers = headers).content.decode('gbk')mysoup = BeautifulSoup(bookcontent.replace('&nbsp;', ''), "lxml")bookcontent = mysoup.find('div',{'id':'content'}).text.replace('一秒記住【筆趣閣 www.bbiquge.com】，精彩小說無彈窗免費閱讀！','').replace(u'\xa0', u' ')fr = open(filename , "a+")fr.write('\r\n' + titlename + '\r\n'+ bookcontent)#書名：透視醫圣	
filename = '/home/topgrec/books/透視醫圣.txt'
if not os.path.exists(filename):os.system(r"touch {}".format(filename))#調用系統命令行來創建文件 
fp = open(filename, 'r', encoding='utf-8')     
allcontent = fp.read()#取得所有章節列表(書名--透視醫圣)
rooturl = "https://www.bbiquge.com/book_46894/"
response = requests.get(rooturl)
rootcontent = response.content.decode('gbk')
soup = BeautifulSoup(rootcontent,"lxml")
booklist = soup.select('dd a')#取得各個章節的內容(書名--透視醫圣)
for info in booklist:s = info.textpat = re.search(s, allcontent)  #判斷該章節是否已經存在，不存在才下載if not pat:getdetails(info) #書名：豪婿	
filename = '/home/topgrec/books/豪婿.txt'
if not os.path.exists(filename):os.system(r"touch {}".format(filename))#調用系統命令行來創建文件 
fp = open(filename, 'r', encoding='utf-8')     
allcontent = fp.read()#取得所有章節列表(書名--豪婿)
rooturl = "https://www.bbiquge.com/book_124646/"
response = requests.get(rooturl)
rootcontent = response.content.decode('gbk')
soup = BeautifulSoup(rootcontent,"lxml")
booklist = soup.select('dd a')#取得各個章節的內容(書名--豪婿)
for info in booklist:s = info.textpat = re.search(s, allcontent)  #判斷該章節是否已經存在，不存在才下載if not pat:getdetails(info)

4. 產生的txt檔，可以利用下面的shell進行轉碼

utf-8 轉 gbk

#!/bin/bash
directory="/home/topgrec/books"
f_encoding="utf-8"
t_encoding="gbk"
for file in `find $directory/*.txt -type f`
doif [ -f "$file" ]thenfname=`echo "$file" | awk -F '/' '{print $5}'`iconv -f $f_encoding -t $t_encoding $file -o $directory/iconv_$fnamemv $directory/iconv_$fname $filefi
done