Universal Dependencies POS Tagger for en / English
A POS tagger for en / English using the Universal Dependencies POS tagset.
The model is trained on all available corpora, except the test corpus. Evaluation on the UD_English_test set gives 0.9377 accuracy. Accuracy on out-of-vocabulary words (words not seen in the trainin set) is 0.7735 (case-sensitive) / 0.8099 (not case-sensitive). Evaluation on the Penn test set gives 0.9054 accuracy. Accuracy on out-of-vocabulary words (words not seen in the trainin set) is 0.8597 (case-sensitive) / 0.8769 (not case-sensitive).
|:Token||Tokens generated with the default tokeniser. The universal dependencies POS tag is stored in feature "upos".|
|Additional annotations available if selected|
|:Sentence||The sentence annotation created by the default regular expression sentence splitter|
|:SpaceToken||As generated with the default tokeniser.|
Use this pipeline
You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.
The API endpoint for this pipeline is:
You can process any amount of data with this pipeline on a pay-as-you-go basis, for GBP0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.