One major impediment to the implementation of machine learning algorithms is the lack of high quality training sets tagged by human coders. The process of assembling a "gold standard" is labor-intensive, costly, and prone to error. Software that can render the construction of valid test sets simpler and more reliable would be a great benefit to any machine learning project.
PamTag is web-based tool that provides this functionality. Project managers create their data sets by uploading csv documents with as many as 5000 text samples, and then create a tagging project by specifying the fields they would like for taggers to capture. Taggers, whose accounts are managed by the project manager, are presented with a randomized set of text entries for tagging. Project managers can run reports on total tag counts, accuracy, and inter-rater reliability.
PamTag is implemented as Java EE application running on commodity hardware that can scale to thousands of concurrent users. To simplify text analysis projects, PamTag will soon offer embedded tools for performing Natural Language Processing on data sets.