觀後小結：技術演講 - WebCrawling and Metadata Extractors...

時間 2019-12-01

標籤觀後小結技術演講 webcrawling metadata extractors 简体版

原文原文鏈接

演講內容摘要：html

Web crawling is a hard problem and the web is messy. There is no shortage of semantic web standards -- basically, everyone has one. How do you make sense of the noise of our web of billions of pages?python

This talk presents two key technologies that can be used: Scrapy, an open source & scalable web crawling framework, and Mr. Schemato, a new, open source semantic web validator and distiller.git

演講視頻在 vimeo 上，幻燈片能夠看 Speaker Deck 上的，或者瀏覽器直接打開這兒。幻燈片是用 reST 和 S5 製做的，源碼在 github 上。github

演講者是 Andrew Montalenti, co-founder/CTO of Parse.ly。web

我的觀後小結：vim