91亚洲一区二区在线观看不卡,红杏亚洲影院一区二区三区,亚洲成av人片天堂网无码】

網友投稿 839 2022-05-30

1.概述

1.1介紹

運用python +requests+beautifulsoup程序實現對shodan搜索引擎中的數據進行爬取，獲取malware（惡意軟件）IP的篩選，根據國家、服務、組織、產品進行搜索獲取具體的IP信息。具體思路是通過requests構造Fromdata登錄對后臺發送數據請求，獲取網頁源碼之后利用BeautifulSoup和正則對源碼進行文本篩選，獲取我們需要的相關信息。

2.原理介紹

2.1 shodan介紹

Shodan是一個搜索引擎，與谷歌不同的是，Shodan不是在網上搜索網址，而是直接進入互聯網的背后通道。Shodan可以說是一款“黑暗”谷歌，一刻不停的在尋找著所有和互聯網關聯的服務器、攝像頭、打印機、路由器等等。每個月Shodan都會在大約5億個服務器上日夜不停地搜集信息。

Shodan所搜集到的信息是極其驚人的。凡是鏈接到互聯網的紅綠燈、安全攝像頭、家庭自動化設備以及加熱系統等等都會被輕易的搜索到。Shodan的使用者曾發現過一個水上公園的控制系統，一個加油站，甚至一個酒店的葡萄酒冷卻器。而網站的研究者也曾使用Shodan定位到了核電站的指揮和控制系統及一個粒子回旋加速器。

Shodan真正值得注意的能力就是能找到幾乎所有和互聯網相關聯的東西。而Shodan真正的可怕之處就是這些設備幾乎都沒有安裝安全防御措施，其可以隨意進入。

2.2 Requests介紹

requests是python的一個HTTP客戶端庫，跟urllib，urllib2類似，那為什么要用requests而不用urllib2呢？官方文檔中是這樣說明的：

python的標準庫urllib2提供了大部分需要的HTTP功能，但是API太逆天了，一個簡單的功能就需要一大堆代碼。本著偷懶的想法，個人更傾向于requests這個第三方庫的運用。

Requests的簡單應用介紹：

#導入模塊

Import requests

#發送GET請求

r = requests.get('http://www.zhidaow.com')

#獲取網頁源碼

r．text()

#禁止跳轉

r=requests.get('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN', allow_redirects = False)

#獲取網頁狀態碼，用來判斷請求是否成功

r.status_code

#獲取響應頭

r.headers

#發送POST,PUT,DELETE,HEAD,OPTIONS請求

#r = requests.post("http://httpbin.org/post")

#r = requests.put("http://httpbin.org/put")

#r = requests.delete("http://httpbin.org/delete")

#r = requests.head("http://httpbin.org/get")

#r = requests.options("http://httpbin.org/get")

更多的應用可以上http://www.zhidaow.com/post/python-requests-install-and-brief-introduction查詢

2.3 BeautifulSoup介紹

BeautifulSoup是一個可以從HTML或XML文件中提取數據的Python庫.它能夠通過你喜歡的轉換器實現慣用的文檔導航,查找,修改文檔的方式.Beautiful Soup會幫你節省數小時甚至數天的工作時間.

BeautifulSoup簡單用法的介紹：

#首先設定一個html文本

html_doc = """

The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were

Elsie,

Lacie and

Tillie;

and they lived at the bottom of a well.

...

#導入BeautifulSoup模塊

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_doc)

#幾個簡單的瀏覽結構化數據的方法

soup.title? #獲取title的標簽行

# The Dormouse's story

soup.title.name

# u'title'

soup.title.string?? #打印title標簽中的文本

# u'The Dormouse's story

soup.title.parent.name? #title標簽的父節點

# u'head

soup.p? #所有P標簽行

The Dormouse's story

soup.p['class'] #打印P標簽中的class屬性值

# u'title

soup.a? #打印a標簽行

# Elsie

soup.find_all('a')#找到所有的a標簽行

# [Elsie,

#? Lacie,

#? Tillie]

soup.find(id="link3")

# Tillie

從文檔中找到所有標簽的鏈接:

for link in soup.find_all('a'):

print(link.get('href'))

# http://example.com/elsie

# http://example.com/lacie

# http://example.com/tillie

從文檔中獲取所有文字內容:

print(soup.get_text())

# The Dormouse's story

# Once upon a time there were three little sisters; and their names were

# Elsie,

# Lacie and

# Tillie;

# and they lived at the bottom of a well.

# ...

以上是beautifulsoup的基本用法，個人喜歡beautifulsoup和正則一起用，這樣比較省時省力。

2.4環境搭建

1、搭建python環境

1) 安裝python安裝包

-：https://www.python.org/downloads/release/python-2711/

2) 在環境變量中添加相應的變量名與變量值

變量名：Path

變量值：C:\Python27 （即Python的安裝路徑）

3)測試Python是否配置完成

在cmd中輸入python -V 提示以下信息表示配置完成

C:\Users\Administrator>python -V

Python 2.7.11

2、安裝 pip

打開C:\Python27\Scripts查看安裝python中是否有pip文件，將C:\Python27\Scripts加入系統環境變量即可，然后打開CMD窗口，執行pip命令，查看pip是否可用。

from bs4 import BeautifulSoup

sys.setdefaultencoding('utf-8')

ports=[2280,1604,1177,16464,443]

organizations=['George+Mason+University','Enzu','Psychz+Networks','Turk+Telekom','HOPUS+SAS']

products=['Gh0st+RAT+trojan','DarkComet+trojan','njRAT+trojan','ZeroAccess+trojan','XtremeRAT+trojan']

page_nums=range(1,6)

countries=['AE','AF','AL','AM','AO','AR','AT','AU','AZ','BD','BE','BF','BG','BH','BI','BJ','BL','BN','BO','BR',

'BW','BY','CA','CF','CG','CH','CL','CM','CN','CO','CR','CS','CU','CY','DE','DK','DO','DZ','EC','EE','EG','ES',

'ET','FI','FJ','FR','GA','GB','GD','GE','GH','GN','GR','GT','HK','HN','HU','ID','IE','IL','IN','IQ','IR','IS',

'IT','JM','KG','KH','KP','KR','KT','KW','KZ','LA','LB','LC','LI','LK','LR','LT','LU','LV','LY','MA','MC','MD',

'MG','ML','MM','MN','MO','MT','MU','MW','MX','MY','MZ','NA','NE','NG','NI','NL','NO','NP','NZ','OM','PA','PE',

'PG','PH','PK','PL','PT','PY','QA','RO','RU','SA','SC','SD','SE','SG','SI','SK','SM','SN','SO','SY','SZ','TD',

'TG','TH','TJ','TM','TN','TR','TW','TZ','UA','UG','US','UY','UZ','VC','VE','VN','YE','YU','ZA','ZM','ZR','ZW']

class Get_IP():

def __init__(self,url,headers):

req=requests.post(self.url,params=data,headers=self.headers)

html=req.text

pattern=re.compile(r'(.*?)')

pattern0=re.compile(r'value=\'(.*?)\'/')

soup=BeautifulSoup(html,'html.parser')

Add_on=soup.find_all(text=re.compile(r'Added\ on'))

for i in soup.select('input[id="search_input"]'):

text_b=re.search(pattern0,str(i)).group(1)

for i in Add_on:

addon_list.append(i)

for i in soup.find_all(href=re.compile(r'/host/')):

if 'Details' not in i.get_text():

ip_list.append(i.get_text())

searchfor_list.append(text_b)

return ip_list,searchfor_list,addon_list

class Get_Ip_Info():

def __init__(self,url,headers):

self.url=url

self.headers=headers

def get_ip_info(self):

for data in url_infos:

req_ip=requests.post(self.url,params=data,headers=self.headers)

html_ip=req_ip.text

soup=BeautifulSoup(html_ip,'html.parser')

tag_text=[]

tag_content=[]

for i in soup.find_all('th'):

tag_text.append(i.get_text())

for i in soup.find_all('td'):

tag_content.append(i.get_text())

for i in soup.select('meta[name="twitter:description"]'):

pattern=re.compile(r'content="Ports open:(.*?)"')

ports=re.search(pattern,str(i)).group(1)

info=dict(zip(tag_content,tag_text))

info['Ports']=ports

info['Ip']=data['continue'].strip('https://www.shodan.io/host/')

information.append(info)

if __name__=="__main__":

headers={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) ""Gecko/20100101 Firefox/25.0"}

url='https://account.shodan.io/login'

file_path=r'C:\Users\Administrator\Downloads\shoudan%s.txt'%time.strftime('%Y-%m-%d',time.localtime(time.time()))

file_w=open(file_path,'wb')

all_url=[]

datas=[]

for country in countries:

for page_num in page_nums:

home_url='https://www.shodan.io/search?query=category%3A%22malware%22+country%3A%22'+'%s'%country+'%22&'+'page=%d'%page_num

all_url.append(home_url)

for port in ports:

for page_num in page_nums:

home_url='https://www.shodan.io/search?query=category%3A%22malware%22+port%3A%22'+'%d'%port+'%22&'+'page=%d'%page_num

all_url.append(home_url)

for org in organizations:

for page_num in page_nums:

home_url='https://www.shodan.io/search?query=category%3A%22malware%22+org%3A%22'+'%s'%org+'%22&'+'page=%d'%page_num

all_url.append(home_url)

for product in products:

for page_num in page_nums:

home_url='https://www.shodan.io/search?query=category%3A%22malware%22+product%3A%22'+'%s'%product+'%22&'+'page=%d'%page_num

all_url.append(home_url)

for continue_url in all_url:

info={

'username':'xxxxxxxxxx',#Shodan賬戶名

'password':'xxxxxxxxxx',#Shodan密碼

'grant_type':'password',

'continue':continue_url,

'login_submit':'Log in'

}

datas.append(info)

app=Get_IP(url,headers)

app.get_info()

for ip in ip_list:

url_ip='https://www.shodan.io/host/'+'%s'%ip

url_info={

'username':'xxxxxxxxxx',#Shodan賬戶名

'password':'xxxxxxxxxx',#Shodan密碼

'grant_type':'password',

'continue':url_ip,

'login_submit':'Log in'

}

url_infos.append(url_info)

app=Get_Ip_Info(url,headers)

app.get_ip_info()

total_info=zip(searchfor_list,information,addon_list)

for i in range(len(total_info)):

search_for=total_info[i][0]

add_on=total_info[i][2]

ip_info=total_info[i][1]['Ip']

try:

city_info=str(total_info[i][1]['City'])

except KeyError:

city_info='NULL'

try:

ports_info=str(total_info[i][1]['Ports'])

except KeyError:

ports_info='NULL'

try:

country_info=str(total_info[i][1]['Country'])

except KeyError:

country_info='NULL'

try:

hostnames_info=str(total_info[i][1]['Hostnames'])

except KeyError:

hostnames_info='NULL'

word=search_for+' ||' +country_info+' '+city_info+' ||'+hostnames_info+' ||'+ip_info+'||'+ports_info+' ||'+add_on

file_w.write(word+'\r\n')

Python 示例介紹

file_w.close()

2.6 功能介紹：

根據搜索條件，對Shodan搜索到的信息進行收集，并每天產生信息日志。初始的一個搜索條件是category:"malware"對惡意軟件的一個搜索，然后再根據國家、服務、組織、產品細化搜索，獲取所有涉及到的IP地址，最后繼續根據IP地址，進入details頁面，返回IP涉及到的詳細數據（Ports、Ip、Hostname、Country、City）

辦公 自動化(三) | 借助服務器定時爬數據發郵件">python辦公 自動化(三) | 借助服務器定時爬數據發郵件

839 2022-05-30

Python3 網絡爬蟲開發實戰] 1.4.3-Redis 的安裝">[Python3 網絡爬蟲開發實戰] 1.4.3-Redis 的安裝

839 2022-05-30

簡單地介紹Excel中的數組公式為進一步的研究和應用打下基礎（excel 數組公式）">簡單地介紹Excel中的數組公式為進一步的研究和應用打下基礎（excel 數組公式）

839 2022-05-30

Python 示例 介紹

辦公 自動化(三) | 借助服務器定時爬數據發郵件">python辦公 自動化(三) | 借助服務器定時爬數據發郵件

Python3 網絡爬蟲開發實戰] 1.4.3-Redis 的安裝">[Python3 網絡爬蟲開發實戰] 1.4.3-Redis 的安裝

簡單地介紹Excel中的數組公式為進一步的研究和應用打下基礎（excel 數組公式）">簡單地介紹Excel中的數組公式為進一步的研究和應用打下基礎（excel 數組公式）

推薦文章

企業生產管理是什么，企業生產管理軟件

進盤點進銷存軟件排行榜前十名

進銷存系統哪個簡單好用？進銷存系統優點

工廠生產管理（工廠生產管理流程及制度）

生產管理軟件，機械制造業生產管理，制造業生產過程管理軟件

進銷存軟件和ERP有什么區別？進銷存與erp軟件理解

進銷存如何進行庫存管理

如何利用excel制作銷售訂單管理系統？

數據庫訂單管理系統有哪些功能？數據庫訂單管理系統怎么設計？

什么是數據庫管理系統？

最近發表

熱評文章

零代碼開發是什么？2022低代碼平臺排行榜">零代碼開發是什么？2022低代碼平臺排行榜

進銷存庫存管理 系統（智慧進銷存）">智能進銷存庫存管理系統（智慧進銷存）

在線文檔哪家強？8款在線文檔編輯軟件推薦">在線文檔哪家強？8款在線文檔編輯軟件推薦

WPS2016怎么繪制簡單的價格表?

系統的功能有哪些？餐飲服務系統的構成及工作程序">連鎖餐飲管理系統的功能有哪些？餐飲服務系統的構成及工

進銷存庫存管理盤點">簡單進銷存庫存管理盤點

友情鏈接

辦公自動化(三) | 借助服務器定時爬數據發郵件">python辦公自動化(三) | 借助服務器定時爬數據發郵件

Python3 網絡爬蟲開發實戰] 1.4.3-Redis 的安裝">[Python3 網絡爬蟲開發實戰] 1.4.3-Redis 的安裝

簡單地介紹Excel中的數組公式為進一步的研究和應用打下基礎（excel 數組公式）">簡單地介紹Excel中的數組公式為進一步的研究和應用打下基礎（excel 數組公式）

推薦文章

最近發表

熱評文章

零代碼開發是什么？2022低代碼平臺排行榜">零代碼開發是什么？2022低代碼平臺排行榜

進銷存庫存管理系統（智慧進銷存）">智能進銷存庫存管理系統（智慧進銷存）

在線文檔哪家強？8款在線文檔編輯軟件推薦">在線文檔哪家強？8款在線文檔編輯軟件推薦

系統的功能有哪些？餐飲服務系統的構成及工作程序">連鎖餐飲管理系統的功能有哪些？餐飲服務系統的構成及工

進銷存庫存管理盤點">簡單進銷存庫存管理盤點

友情鏈接

零代碼開發是什么？2022低代碼平臺排行榜">零代碼開發是什么？2022低代碼平臺排行榜

在線文檔哪家強？8款在線文檔編輯軟件推薦">在線文檔哪家強？8款在線文檔編輯軟件推薦

系統的功能有哪些？餐飲服務系統的構成及工作程序">連鎖餐飲管理系統的功能有哪些？餐飲服務系統的構成及工