Python 示例介紹
1.概述
1.1介紹
運用python +requests+beautifulsoup程序實現對shodan搜索引擎中的數據進行爬取,獲取malware(惡意軟件)IP的篩選,根據國家、服務、組織、產品進行搜索獲取具體的IP信息。具體思路是通過requests構造Fromdata登錄對后臺發送數據請求,獲取網頁源碼之后利用BeautifulSoup和正則對源碼進行文本篩選,獲取我們需要的相關信息。
2.原理介紹
2.1 shodan介紹
Shodan是一個搜索引擎,與谷歌不同的是,Shodan不是在網上搜索網址,而是直接進入互聯網的背后通道。Shodan可以說是一款“黑暗”谷歌,一刻不停的在尋找著所有和互聯網關聯的服務器、攝像頭、打印機、路由器等等。每個月Shodan都會在大約5億個服務器上日夜不停地搜集信息。
Shodan所搜集到的信息是極其驚人的。凡是鏈接到互聯網的紅綠燈、安全攝像頭、家庭自動化設備以及加熱系統等等都會被輕易的搜索到。Shodan的使用者曾發現過一個水上公園的控制系統,一個加油站,甚至一個酒店的葡萄酒冷卻器。而網站的研究者也曾使用Shodan定位到了核電站的指揮和控制系統及一個粒子回旋加速器。
Shodan真正值得注意的能力就是能找到幾乎所有和互聯網相關聯的東西。而Shodan真正的可怕之處就是這些設備幾乎都沒有安裝安全防御措施,其可以隨意進入。
2.2 Requests介紹
requests是python的一個HTTP客戶端庫,跟urllib,urllib2類似,那為什么要用requests而不用urllib2呢?官方文檔中是這樣說明的:
python的標準庫urllib2提供了大部分需要的HTTP功能,但是API太逆天了,一個簡單的功能就需要一大堆代碼。本著偷懶的想法,個人更傾向于requests這個第三方庫的運用。
Requests的簡單應用介紹:
#導入模塊
Import requests
#發送GET請求
r = requests.get('http://www.zhidaow.com')
#獲取網頁源碼
r.text()
#禁止跳轉
r=requests.get('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN', allow_redirects = False)
#獲取網頁狀態碼,用來判斷請求是否成功
r.status_code
#獲取響應頭
r.headers
#發送POST,PUT,DELETE,HEAD,OPTIONS請求
#r = requests.post("http://httpbin.org/post")
#r = requests.put("http://httpbin.org/put")
#r = requests.delete("http://httpbin.org/delete")
#r = requests.head("http://httpbin.org/get")
#r = requests.options("http://httpbin.org/get")
更多的應用可以上http://www.zhidaow.com/post/python-requests-install-and-brief-introduction查詢
2.3 BeautifulSoup介紹
BeautifulSoup是一個可以從HTML或XML文件中提取數據的Python庫.它能夠通過你喜歡的轉換器實現慣用的文檔導航,查找,修改文檔的方式.Beautiful Soup會幫你節省數小時甚至數天的工作時間.
BeautifulSoup簡單用法的介紹:
#首先設定一個html文本
html_doc = """
The Dormouse's story
Once upon a time there were three little sisters; and their names were
Lacie and
and they lived at the bottom of a well.
...
#導入BeautifulSoup模塊
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc)
#幾個簡單的瀏覽結構化數據的方法
soup.title? #獲取title的標簽行
#
soup.title.name
# u'title'
soup.title.string?? #打印title標簽中的文本
# u'The Dormouse's story
'
soup.title.parent.name? #title標簽的父節點
# u'head
'
soup.p? #所有P標簽行
#
The Dormouse's story
soup.p['class'] #打印P標簽中的class屬性值
# u'title
'
soup.a? #打印a標簽行
# Elsie
soup.find_all('a')#找到所有的a標簽行
# [Elsie,
#? Lacie,
#? Tillie]
soup.find(id="link3")
# Tillie
從文檔中找到所有標簽的鏈接:
for link in soup.find_all('a'):
print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie
從文檔中獲取所有文字內容:
print(soup.get_text())
# The Dormouse's story
#
# The Dormouse's story
#
# Once upon a time there were three little sisters; and their names were
# Elsie,
# Lacie and
# Tillie;
# and they lived at the bottom of a well.
#
# ...
以上是beautifulsoup的基本用法,個人喜歡beautifulsoup和正則一起用,這樣比較省時省力。
2.4環境搭建
1、搭建python環境
1) 安裝python安裝包
-:https://www.python.org/downloads/release/python-2711/
2) 在環境變量中添加相應的變量名與變量值
變量名:Path
變量值:C:\Python27 (即Python的安裝路徑)
3)測試Python是否配置完成
在cmd中輸入python -V 提示以下信息表示配置完成
C:\Users\Administrator>python -V
Python 2.7.11
2、安裝 pip
打開C:\Python27\Scripts查看安裝python中是否有pip文件,將C:\Python27\Scripts加入系統環境變量即可,然后打開CMD窗口,執行pip命令,查看pip是否可用。
3.安裝 requests和bs4
1)通過pip 安裝
pip install requests
2)安裝bs4
pip install bs4
2.5 程序代碼:
#_*_coding:utf-8_*_
import requests
from bs4 import BeautifulSoup
import re
import time
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
url_infos=[]
ip_list=[]
searchfor_list=[]
addon_list=[]
information=[]
ports=[2280,1604,1177,16464,443]
organizations=['George+Mason+University','Enzu','Psychz+Networks','Turk+Telekom','HOPUS+SAS']
products=['Gh0st+RAT+trojan','DarkComet+trojan','njRAT+trojan','ZeroAccess+trojan','XtremeRAT+trojan']
page_nums=range(1,6)
countries=['AE','AF','AL','AM','AO','AR','AT','AU','AZ','BD','BE','BF','BG','BH','BI','BJ','BL','BN','BO','BR',
'BW','BY','CA','CF','CG','CH','CL','CM','CN','CO','CR','CS','CU','CY','DE','DK','DO','DZ','EC','EE','EG','ES',
'ET','FI','FJ','FR','GA','GB','GD','GE','GH','GN','GR','GT','HK','HN','HU','ID','IE','IL','IN','IQ','IR','IS',
'IT','JM','KG','KH','KP','KR','KT','KW','KZ','LA','LB','LC','LI','LK','LR','LT','LU','LV','LY','MA','MC','MD',
'MG','ML','MM','MN','MO','MT','MU','MW','MX','MY','MZ','NA','NE','NG','NI','NL','NO','NP','NZ','OM','PA','PE',
'PG','PH','PK','PL','PT','PY','QA','RO','RU','SA','SC','SD','SE','SG','SI','SK','SM','SN','SO','SY','SZ','TD',
'TG','TH','TJ','TM','TN','TR','TW','TZ','UA','UG','US','UY','UZ','VC','VE','VN','YE','YU','ZA','ZM','ZR','ZW']
class Get_IP():
def __init__(self,url,headers):
self.headers=headers
self.url=url
def get_info(self):
for data in datas:
req=requests.post(self.url,params=data,headers=self.headers)
html=req.text
pattern=re.compile(r'
pattern0=re.compile(r'value=\'(.*?)\'/')
soup=BeautifulSoup(html,'html.parser')
Add_on=soup.find_all(text=re.compile(r'Added\ on'))
for i in soup.select('input[id="search_input"]'):
text_b=re.search(pattern0,str(i)).group(1)
for i in Add_on:
addon_list.append(i)
for i in soup.find_all(href=re.compile(r'/host/')):
if 'Details' not in i.get_text():
ip_list.append(i.get_text())
searchfor_list.append(text_b)
return ip_list,searchfor_list,addon_list
class Get_Ip_Info():
def __init__(self,url,headers):
self.url=url
self.headers=headers
def get_ip_info(self):
for data in url_infos:
req_ip=requests.post(self.url,params=data,headers=self.headers)
html_ip=req_ip.text
soup=BeautifulSoup(html_ip,'html.parser')
tag_text=[]
tag_content=[]
for i in soup.find_all('th'):
tag_text.append(i.get_text())
for i in soup.find_all('td'):
tag_content.append(i.get_text())
for i in soup.select('meta[name="twitter:description"]'):
pattern=re.compile(r'content="Ports open:(.*?)"')
ports=re.search(pattern,str(i)).group(1)
info=dict(zip(tag_content,tag_text))
info['Ports']=ports
info['Ip']=data['continue'].strip('https://www.shodan.io/host/')
information.append(info)
if __name__=="__main__":
headers={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) ""Gecko/20100101 Firefox/25.0"}
url='https://account.shodan.io/login'
file_path=r'C:\Users\Administrator\Downloads\shoudan%s.txt'%time.strftime('%Y-%m-%d',time.localtime(time.time()))
file_w=open(file_path,'wb')
all_url=[]
datas=[]
for country in countries:
for page_num in page_nums:
home_url='https://www.shodan.io/search?query=category%3A%22malware%22+country%3A%22'+'%s'%country+'%22&'+'page=%d'%page_num
all_url.append(home_url)
for port in ports:
for page_num in page_nums:
home_url='https://www.shodan.io/search?query=category%3A%22malware%22+port%3A%22'+'%d'%port+'%22&'+'page=%d'%page_num
all_url.append(home_url)
for org in organizations:
for page_num in page_nums:
home_url='https://www.shodan.io/search?query=category%3A%22malware%22+org%3A%22'+'%s'%org+'%22&'+'page=%d'%page_num
all_url.append(home_url)
for product in products:
for page_num in page_nums:
home_url='https://www.shodan.io/search?query=category%3A%22malware%22+product%3A%22'+'%s'%product+'%22&'+'page=%d'%page_num
all_url.append(home_url)
for continue_url in all_url:
info={
'username':'xxxxxxxxxx',#Shodan賬戶名
'password':'xxxxxxxxxx',#Shodan密碼
'grant_type':'password',
'continue':continue_url,
'login_submit':'Log in'
}
datas.append(info)
app=Get_IP(url,headers)
app.get_info()
for ip in ip_list:
url_ip='https://www.shodan.io/host/'+'%s'%ip
url_info={
'username':'xxxxxxxxxx',#Shodan賬戶名
'password':'xxxxxxxxxx',#Shodan密碼
'grant_type':'password',
'continue':url_ip,
'login_submit':'Log in'
}
url_infos.append(url_info)
app=Get_Ip_Info(url,headers)
app.get_ip_info()
total_info=zip(searchfor_list,information,addon_list)
for i in range(len(total_info)):
search_for=total_info[i][0]
add_on=total_info[i][2]
ip_info=total_info[i][1]['Ip']
try:
city_info=str(total_info[i][1]['City'])
except KeyError:
city_info='NULL'
try:
ports_info=str(total_info[i][1]['Ports'])
except KeyError:
ports_info='NULL'
try:
country_info=str(total_info[i][1]['Country'])
except KeyError:
country_info='NULL'
try:
hostnames_info=str(total_info[i][1]['Hostnames'])
except KeyError:
hostnames_info='NULL'
word=search_for+' ||' +country_info+' '+city_info+' ||'+hostnames_info+' ||'+ip_info+'||'+ports_info+' ||'+add_on
file_w.write(word+'\r\n')
file_w.close()
2.6 功能介紹:
根據搜索條件,對Shodan搜索到的信息進行收集,并每天產生信息日志。初始的一個搜索條件是category:"malware"對惡意軟件的一個搜索,然后再根據國家、服務、組織、產品細化搜索,獲取所有涉及到的IP地址,最后繼續根據IP地址,進入details頁面,返回IP涉及到的詳細數據(Ports、Ip、Hostname、Country、City)
版權聲明:本文內容由網絡用戶投稿,版權歸原作者所有,本站不擁有其著作權,亦不承擔相應法律責任。如果您發現本站中有涉嫌抄襲或描述失實的內容,請聯系我們jiasou666@gmail.com 處理,核實后本網站將在24小時內刪除侵權內容。