最近作了兩件事,一件事就是買了塊1t硬盤,第二件事就是買了個百度雲會員,無奈找不到資源下載,那就沒辦法了,搭建一個磁力連接搜索引擎來爬去連接,而後去找資源。javascript
說道磁力連接搜索引擎,最好的固然是手撕包菜了。html
直接使用腳本搭建安裝,記住服務器內存最好1g以上的java
wget --no-check-certificate https://raw.githubusercontent.com/banwagong-news/scripts/master/ssbc-setup.sh && bash ssbc-setup.sh
node
以後會讓你輸入下面的信息python
請輸入網站域名,多個域名用空格隔開:192.168.1.149 肯定瀏覽器能訪問網站 http://192.168.1.149 嗎?[y/n]y Username (leave blank to use 'root'): root Email address: bboysoulcn@gmail.com Password: Password (again): Superuser created successfully.
等待一段時間就會有數據了,可是注意服務器必定要是國外的服務器,爲何要使用國外的服務器呢你們應該都懂的。mysql
腳本安裝的mariadb默認是不容許其餘機器登陸的,因此若是你要使用本地的數據庫鏈接工具鏈接這個mariadb的話就要開啓mariadb的root遠程鏈接了,還有就是默認是沒有密碼的,因此最好你設置一個root密碼,首先設置root密碼,輸入linux
mysql_secure_installation
git
以後按照提示操做就好github
[root@bboysoul-centos ssbc]# mysql_secure_installation NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB SERVERS IN PRODUCTION USE! PLEASE READ EACH STEP CAREFULLY! In order to log into MariaDB to secure it, we'll need the current password for the root user. If you've just installed MariaDB, and you haven't set the root password yet, the password will be blank, so you should just press enter here. Enter current password for root (enter for none): OK, successfully used password, moving on... Setting the root password ensures that nobody can log into the MariaDB root user without the proper authorisation. Set root password? [Y/n] y New password: Re-enter new password: Password updated successfully! Reloading privilege tables.. ... Success! By default, a MariaDB installation has an anonymous user, allowing anyone to log into MariaDB without having to have a user account created for them. This is intended only for testing, and to make the installation go a bit smoother. You should remove them before moving into a production environment. Remove anonymous users? [Y/n] y ... Success! Normally, root should only be allowed to connect from 'localhost'. This ensures that someone cannot guess at the root password from the network. Disallow root login remotely? [Y/n] n ... skipping. By default, MariaDB comes with a database named 'test' that anyone can access. This is also intended only for testing, and should be removed before moving into a production environment. Remove test database and access to it? [Y/n] y - Dropping test database... ... Success! - Removing privileges on test database... ... Success! Reloading the privilege tables will ensure that all changes made so far will take effect immediately. Reload privilege tables now? [Y/n] y ... Success! Cleaning up... All done! If you've completed all of the above steps, your MariaDB installation should now be secure. Thanks for using MariaDB!
以後就是開啓mariadb的遠程訪問 首先登錄mariadbweb
mysql -u root -p
以後輸入下面命令
MariaDB [mysql]> use mysql Database changed MariaDB [mysql]> update user set Host='%' where Host='localhost'; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 MariaDB [mysql]> flush privileges; Query OK, 0 rows affected (0.00 sec) MariaDB [mysql]>
接着就能夠遠程登錄數據庫了 以後要修改手撕包菜程序裏面的鏈接密碼 首先關閉相關的進程
ps -ef |grep python
通常就是下面幾個進程
root 958 1 0 20:51 ? 00:00:00 /usr/bin/python -Es /usr/sbin/tuned -l -P root 3604 1 0 21:13 pts/0 00:00:00 /usr/bin/python2 /usr/bin/gunicorn ssbc.wsgi:application -b 127.0.0.1:8000 --reload root 3616 3604 1 21:13 pts/0 00:00:09 /usr/bin/python2 /usr/bin/gunicorn ssbc.wsgi:application -b 127.0.0.1:8000 --reload root 3693 1 12 21:15 ? 00:01:30 python simdht_worker.py root 3694 1 0 21:15 ? 00:00:00 python index_worker.py
kill以後再kill下面幾個進程
ps -ef |grep search
root 3467 1 0 21:03 ? 00:00:00 searchd --config ./sphinx.conf root 3468 3467 0 21:03 ? 00:00:02 searchd --config ./sphinx.conf
接着修改配置文件
vim /root/ssbc/sphinx.conf
增長數據庫的密碼
sql_host = 127.0.0.1 sql_user = root sql_pass = sql_db = ssbc sql_port = 3306 # optional, default is 3306
vi /root/ssbc/workers/index_worker.py
SRC_HOST = '127.0.0.1' SRC_USER = 'root' SRC_PASS = '' DST_HOST = '127.0.0.1' DST_USER = 'root' DST_PASS = ''
上面兩個密碼都要修改
vi /root/ssbc/workers/simdht_worker.py
DB_HOST = '127.0.0.1' DB_USER = 'root' DB_PORT = 3306 DB_PASS = '' DB_NAME = 'ssbc' BLACK_FILE = 'black_list.txt'
vim /root/ssbc/ssbc/settings.py
修改下面,root後面加上數據庫密碼
DATABASES = { 'default': { 'ENGINE': 'django.db.backends.mysql', 'NAME': 'ssbc', 'HOST': '127.0.0.1', 'PORT': 3306, 'USER': 'root', 'PASSWORD': 'woyaoxuehuilinux', 'OPTIONS': { "init_command": "SET storage_engine=MYISAM", } } }
這個其實好辦先在新的機器上執行腳本,執行完成以後刪除數據庫創建新的ssbc數據庫記住編碼要utf-8的,以後把老的數據庫導入新的就能夠了
我就直接複製粘貼了
1.必須centos7嗎? 很是建議使用centos7,centos6可能會有意想不到的錯誤 2.如何設置首頁關鍵字? 登陸管理員後臺,點擊Rec keywordss,右上角新增 3.怎麼查看入庫的文件? 登陸管理員後臺,點擊 Hashs 4.怎麼查看天天入庫了多少文件,以便清楚入庫效率? 登陸管理員後臺,點擊 Status reports 5.如何確認web服務器、採集、入庫正在運行? 運行 ps -ef|grep python|grep -v grep 結果裏面有 gunicorn ssbc.wsgi:application -b 127.0.0.1:8000 --reload python simdht_worker.py python index_worker.py 即表示正在運行。 —————————————————————————————————————— 去除搜索頁 右下角廣告 [root[@localhost](https://my.oschina.net/u/570656) ssbc-master]# cd web/static/js [root[@localhost](https://my.oschina.net/u/570656) js]# vi ssbc.js 找到以下3行,在前面添加//進行註釋,保存 // document.write('<script src="http://v.6dvip.com/ge/?s=47688"><\/script>'); // document.writeln("<script language=\"JavaScript\" type=\"text/javascript\" src=\"http://js.6dad.com/js/xiaoxia.js\"></script>"); // document.writeln("<script language=\"JavaScript\" type=\"text/javascript\" src=\"http://js.ta80.com/js/12115.js\"></script>"); —————————————————————————————————————— 如何修改擴展名歸類? workers/metautils.py文件中有以下代碼: def get_category(ext): ext = ext + '.' cats = { u'video': '.avi.mp4.rmvb.m2ts.wmv.mkv.flv.qmv.rm.mov.vob.asf.3gp.mpg.mpeg.m4v.f4v.', u'image': '.jpg.bmp.jpeg.png.gif.tiff.', u'document': '.pdf.isz.chm.txt.epub.bc!.doc.ppt.', u'music': '.mp3.ape.wav.dts.mdf.flac.', u'package': '.zip.rar.7z.tar.gz.iso.dmg.pkg.', u'software': '.exe.app.msi.apk.' } 意思是:擴展名爲.exe、.app、.msi、,.apk的文件都屬於software類型。 若是你把u'software': '.exe.app.msi.apk.' 改成 u'software': 'app.msi.apk.',那麼exe將會被歸爲other類型。 因此在這裏修改歸類設置。 —————————————————————————————————————— 如何禁止某些格式/分類的文件入庫? workers/metadata.py文件中有以下代碼: info['extension'] = metautils.get_extension(bigfname).lower() info['category'] = metautils.get_category(info['extension']) 因此若是你要排除擴展名爲.exe的文件,或者類型爲software,能夠在上面代碼後面加上 ##########這是增長的過濾-開始############ #按擴張名過濾,禁止擴展名爲.exe的入庫 if info['extension'] == 'exe': return # 直接返回,跳過下面的入庫 #按文件類型過濾,禁止類型爲software的入庫 if info['category'] == 'software': return #禁止類型爲other的入庫 if info['category'] == 'other': return ##########這是增長的過濾-結束############ —————————————————————————————————————— 如何重建索引? 第一步: 刪除/data目錄 第二步: 進入數據庫,把search_hash表中全部記錄的tagged字段置爲0。 UPDATE search_hash SET tagged=0 而後啓動sphinx、index_worker.py。 —————————————————————————————————————— MySQL server has gone away提示怎麼辦? ssbc 運行一段時間後,大概半個小時,就莫名奇妙中止不爬了。 錯誤提示以下: MySQL server has gone away 經過錯誤提示能夠看出,實際上是ssbc與mysql(maridb)斷開鏈接了,致使程序異常,固然就插入不了數據了。 有3種解決辦法: 方法1是寫個腳本,定時重啓爬蟲。 方法2是修改下代碼,當mysql斷開鏈接時,再次重連mysql就能夠拉。 方法3是修改Mysql配置,將閒置時間wait_timeout設置長一點。 —————————————————————————————————————— 哪裏設置爬蟲線程?讓爬蟲爬快/慢點? 在workers/simdht_worker.py裏面把MAX_QUEUE_LT、MAX_QUEUE_PT、max_node_qsize設大/小一點。 如何關閉調試模式?設置404頁面? 請參考 http://www.githubs.cn/post/19 —————————————————————————————————————— 如何在搜索結果頁面添加迅雷連接? 在web/views.py文件加入如下代碼生成迅雷連接: import base64 xunleiurl = 'AAmagnet:?xt=urn:btih:' + d['info']['info_hash'] + 'ZZ' d['xunlei_url'] = 'thunder://' + base64.b64encode(xunleiurl) 能夠在模板中用「 {{xunlei_url}} 」調用。位置要放在return render(request, 'info.html', d)的前面。 —————————————————————————————————————— SSBC如何搬家? 數據庫用mysqldump導出sql,在新服務器上運行一鍵包,再導入剛纔的sql。 —————————————————————————————————————— 提示duplicate id 'xxxx'解決辦法 進入數據庫,執行語句 update search_hash set tagged=True where id=xxxx;
歡迎關注Bboysoul的博客www.bboysoul.com Have Fun