Skip to content

Latest commit

 

History

History
35 lines (28 loc) · 1.04 KB

README.md

File metadata and controls

35 lines (28 loc) · 1.04 KB

trainingsetbuilder

This project helps us build a training set for the Aztec tool classification project. Users can manually classify publications as either containing a new software tool or not.

#setup
pip install virtualenv
virtualenv --python=`which python` ~/.virtualenvs/django
source ~/.virtualenvs/django/bin/activate
pip install django

#launch
source ~/.virtualenvs/django/bin/activate
python manage.py runserver

Find a tool to classify on localhost:8000/classify/next

View database at localhost:8000/admin

How to add a bunch of new publications to the database(example):

from classify.models import Publication
import urllib2, json

journal = 'Bioinformatics (Oxford, England)'
count = 5000
query = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="' + journal + '"[Journal]&retmode=json&retmax=' + str(count)

response = urllib2.urlopen(query).read()
data = json.loads(response)
idlist = data["esearchresult"]["idlist"]
for pm_id in idlist:
	p = Publication(pmid=pm_id, classification=-1)
	p.save()