stanford tagger About A Part-Of-Speech Tagger (POS

文件名称: stanford tagger

所属分类: 企业管理

开发工具:

文件大小: 1.41mb

下载次数: 1

上传时间: 2013-04-10

提供者: bbkin*****

下载 (1.41mb)

不能下载？报告错误

详细说明： About A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. This software is a Java implementation of the log-linear part-of-speech taggers described in these papers (if citing just one paper, cite the 2003 one): Kristina Toutanova and Christopher D. Manning. 2000. Enriching the Knowledge Sources Us ed in a Maximum Entropy Part-of-Speech Tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 63-70. Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003, pp. 252-259. The tagger was originally written by Kristina Toutanova. Since that time, Dan Klein, Christopher Manning, William Morgan, Anna Rafferty, Michel Galley, and John Bauer have improved its speed, performance, usability, and support for other languages. The system requires Java 1.6+ to be installed. Depending on whether you're running 32 or 64 bit Java and the complexity of the tagger model, you'll need somewhere between 60 and 200 MB of memory to run a trained tagger (i.e., you may need to give java an option like java -mx200m). Plenty of memory is needed to train a tagger. It again depends on the complexity of the model but at least 1GB is usually needed, often more. Several downloads are available. The basic download contains two trained tagger models for English. The full download contains three trained English tagger models, an Arabic tagger model, a Chinese tagger model, and a German tagger model. Both versions include the same source and other required files. The tagger can be retrained on any language, given POS-annotated training text for the language. Part-of-speech name abbreviations: The English taggers use the Penn Treebank tag set. Here are some links to documentation of the Penn Treebank English POS tag set: 1993 Computational Linguistics article in PDF, AMALGAM page, Aoife Cahill's list. See the included README-Models.txt in the models directory for more information about the tagsets for the other languages. The tagger is licensed under the GNU General Public License (v2 or later). Source is included. Source is included. The package includes components for command-line invocation, running as a server, and a Java API. The tagger code is dual licensed (in a similar manner to MySQL, etc.). Open source licensing is under the full GPL, which allows many free uses. For distributors of proprietary software, commercial licensing with a ready-to-sign agreement is available. If you don't need a commercial license, but would like to support maintenance of these tools, we welcome gift funding. ...展开收缩

(系统自动生成,下载前可以参看下载内容)

下载文件列表

压缩包 : 6b5d9b343f33b9a0c4406e67b250b03b.jar 列表

下载资源分类

移动开发

开发技术

课程资源

网络技术

操作系统

安全技术

数据库

行业

服务器应用

存储

信息化

考试认证

云计算

大数据

跨平台

音视频

游戏开发

人工智能

区块链

资源分类

电子商务

管理软件

IT管理

企业管理

项目管理

其它

本站统计

资源总数：630万个
资源大小：15TB
今日更新：468个
注册人数：225万
今日注册：838

加入“点数信息”会员

　　“点数信息”是专业的,大型的源码,编程资源等搜索,交换平台,旨在帮助软件开发人员提供源码,编程资源下载,技术交流等服务!目前源码资源大小已超过8TB。
　　超值价格，购买下载积分，即时到帐，无需等待马上可以下载你所需的资料。无限期使用，一次购买越多越优惠！

免费获取积分

　　免费获得积分的途径是通过会员下载您上传的资料，您的帐户即增加积分。
　　立即上传资料，越多越好，被搜索到的机会越大！越早上传越早得积分，下载次数越多，您的积分越多。