Installing [S]crape will install or upgrade the following python libraries:
- argparse
- lxml
- cssselect
- PyYAML
- selenium
- [S]crape‘s version of envoy
- a [S]crape plugin library
Installation requires you have a compiler on your machine.
For Linux systems, this should already be the case.
For Macintosh OS/X systems, downloaded Xcode free from the Mac App Store (also install the command line tools). Alternatively, you may be able to install just the command-line tools (see https://github.com/kennethreitz/osx-gcc-installer - we have not tried this).
For Windows platforms, this is not a straightforward process.
See the MS Windows
section at http://lxml.de/installation.html.
I suggest you use python’s virtualenv
, particularly your first
time with [S]crape
(see virtualenv).
This will ensure you have an isolated, clean python install of [S]crape to start. Once you have this working, you may consider installing this your system’s python site-libraries.
To properly use virtualenv
, you’ll need pip
.
Ensure you have pip
installed:
$ which pip
If you don’t have pip installed, then install it:
$ easy_install pip
If you do have pip, be sure it’s up-to-date:
$ pip install --upgrade pip
Todo
Have yet to debug the scrape.gz install file (installation does not mirror setup.py).
Now, install the current version of [S]crape. Currently, you must do this from sources. Clone a copy of [S]crape and run setup.py:
$ hg clone ssh://hg@bitbucket.org/yarko/scrape
$ cd scrape
$ python setup.py install
Footnotes
[1] | Firefox is the only browser officially supported for [S]crape. As an alternative, you may try a current version of Chrome, but note that you will need to download a chrome-webdriver. For some combinations of versions of Chrome, chrome-webdriver and selenium, timeouts didn’t properly work. For some medical journal sites with continuous stream advertising, Chrome would not respond (would never return when called from scrape). |