[S]crape Installation

[S]crape requires the following:

Installing [S]crape will install or upgrade the following python libraries:

  • argparse
  • lxml
  • cssselect
  • PyYAML
  • selenium
  • [S]crape‘s version of envoy
  • a [S]crape plugin library

Installation requires you have a compiler on your machine.

For Linux systems, this should already be the case.

For Macintosh OS/X systems, downloaded Xcode free from the Mac App Store (also install the command line tools). Alternatively, you may be able to install just the command-line tools (see https://github.com/kennethreitz/osx-gcc-installer - we have not tried this).

For Windows platforms, this is not a straightforward process. See the MS Windows section at http://lxml.de/installation.html.

Installing [S]crape

I suggest you use python’s virtualenv, particularly your first time with [S]crape (see virtualenv).

This will ensure you have an isolated, clean python install of [S]crape to start. Once you have this working, you may consider installing this your system’s python site-libraries.

To properly use virtualenv, you’ll need pip. Ensure you have pip installed:

$ which pip

If you don’t have pip installed, then install it:

$ easy_install pip

If you do have pip, be sure it’s up-to-date:

$ pip install --upgrade pip

Todo

Have yet to debug the scrape.gz install file (installation does not mirror setup.py).

Now, install the current version of [S]crape. Currently, you must do this from sources. Clone a copy of [S]crape and run setup.py:

$ hg clone ssh://hg@bitbucket.org/yarko/scrape
$ cd scrape
$ python setup.py install

Footnotes

[1]Firefox is the only browser officially supported for [S]crape. As an alternative, you may try a current version of Chrome, but note that you will need to download a chrome-webdriver. For some combinations of versions of Chrome, chrome-webdriver and selenium, timeouts didn’t properly work. For some medical journal sites with continuous stream advertising, Chrome would not respond (would never return when called from scrape).
comments powered by Disqus

Table Of Contents

Previous topic

Overview

Next topic

Tutorials

This Page

Edit this document!

Anyone with a Github account can edit and submit changes directly through the Web.

  1. Click to edit: Installation
  2. Edit using GitHub's editor in your web browser (click 'Edit' tab on the top right)
  3. Fill in the Commit message the bottom of the page describing why you made the changes. If you've completed your changes, press the Propose file change button.
  4. If you've completed your changes, click Send a pull request.
  5. Your changes are now queued for review under the project's Pull requests tab on GitHub!

For an introduction to the documentation format see the reST primer.