所属分类:
多国语言处理
开发工具:Java
文件大小:1583KB
下载次数:11
上传日期:2008-04-25 13:46:22
说明: segment,一个简单的中文分词程序,命令行如下:
java -jar segmenter.jar [-b|-g|-8|-s|-t] inputfile.txt
-b Big5, -g GB2312, -8 UTF-8, -s simp. chars, -t trad. chars
Segmented text will be saved to inputfile.txt.seg
(segment, a simple Chinese word segmentation process, the following command line: java-jar segmenter.jar [-b |-g |-8 |-s |-t] inputfile.txt-b Big5,-g GB2312,-8 UTF-8,-s simp. chars,-t trad. charsSegmented text will be saved to inputfile.txt.seg)
文件列表:
bothlexu8.txt
data
....\sforeign_u8.txt
....\snotname_u8.txt
....\snumbers_u8.txt
....\ssurname_u8.txt
....\tforeign_u8.txt
....\tnotname_u8.txt
....\tnumbers_u8.txt
....\tsurname_u8.txt
META-INF
........\MANIFEST.MF
segmenter.class
segmenter.java
simplexu8.txt
tradlexu8.txt