Wednesday 28 March 2012

Downloading full CiteSeerX data

Just saw this link and found it very interesting.

Link: http://b010.blogspot.com/2008/11/downloading-full-citeseerx-data.html

I copy here for backup (to avoid if the original link dies).

Steps for downloading the full dataset from CiteSeerX:
  1. Download and extract the "Demo" from http://www.oclc.org/research/software/oai/harvester.htm
  2. Go to the directory of the extracted files, type the following command to download the full dataset of CiteSeerX to the file "citeseerx_alldata.xml"
    java -classpath .;oaiharvester.jar;xerces.jar org.acme.oai.OAIReaderRawDump http://citeseerx.ist.psu.edu/oai2 -o citeseerx_alldata.xml

Thanks the author for that.

--
Cheers,
Vu

2 comments:

  1. when i run the above command the following error comes up...please can u help me with this ?

    oaiharvester.jar: command not found
    xerces.jar: command not found


    ReplyDelete
  2. Hi,
    Please visit this site: http://www.oclc.org/research/activities/harvester/harvester.html

    Try to find related info in ReadMe file.

    ReplyDelete