Installation¶

requires the following:

a recent version of python 2.7

a recent version of Firefox [1]

Installing [S]^crape will install or upgrade the following python libraries:

argparse

lxml

cssselect

PyYAML

selenium

[S]^crape‘s version of envoy

a [S]^crape plugin library

Installation requires you have a compiler on your machine.

For Linux systems, this should already be the case.

For Macintosh OS/X systems, downloaded Xcode free from the Mac App Store (also install the command line tools). Alternatively, you may be able to install just the command-line tools (see https://github.com/kennethreitz/osx-gcc-installer - we have not tried this).

For Windows platforms, this is not a straightforward process. See the MS Windows section at http://lxml.de/installation.html.

Installing ¶

I suggest you use python’s virtualenv, particularly your first time with [S]^crape (see virtualenv).

This will ensure you have an isolated, clean python install of [S]^crape to start. Once you have this working, you may consider installing this your system’s python site-libraries.

To properly use virtualenv, you’ll need pip. Ensure you have pip installed:

$ which pip

If you don’t have pip installed, then install it:

$ easy_install pip

If you do have pip, be sure it’s up-to-date:

$ pip install --upgrade pip

Todo

Have yet to debug the scrape.gz install file (installation does not mirror setup.py).

Now, install the current version of [S]^crape. Currently, you must do this from sources. Clone a copy of [S]^crape and run setup.py:

$ hg clone ssh://hg@bitbucket.org/yarko/scrape
$ cd scrape
$ python setup.py install

Footnotes

[1]

Firefox is the only browser officially supported for [S]^crape. As an alternative, you may try a current version of Chrome, but note that you will need to download a chrome-webdriver. For some combinations of versions of Chrome, chrome-webdriver and selenium, timeouts didn’t properly work. For some medical journal sites with continuous stream advertising, Chrome would not respond (would never return when called from scrape).

comments powered by Disqus

Installation¶

Installing ¶

Table Of Contents

Previous topic

Next topic

This Page

Edit this document!

Navigation

Installation¶

Installing ¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Edit this document!

Navigation