所属分类:
多国语言处理
开发工具:Java
文件大小:3180KB
下载次数:76
上传日期:2009-07-13 00:38:33
说明: imdict-chinese-analyzer 是 imdict智能词典 的智能中文分词模块,算法基于隐马尔科夫模型(Hidden Markov Model, HMM),是中国科学院计算技术研究所的ictclas中文分词程序的重新实现(基于Java),可以直接为lucene搜索引擎提供简体中文分词支持。
(imdict-chinese-analyzer is a smart imdict Chinese Dictionary smart module segmentation algorithm based on Hidden Markov Model (Hidden Markov Model, HMM), the Chinese Academy of Sciences Institute of Computing Technology of Chinese word segmentation ictclas process re-implement (based on Java ), can be directly provided for the lucene search engine support for Simplified Chinese word segmentation.)
文件列表:
chinese-analyzer
................\.classpath
................\.project
................\analysis-data
................\.............\bigramdict.dct
................\.............\coredict.dct
................\.............\license.txt
................\.............\readme.txt
................\.............\stopwords_utf8.txt
................\lib
................\...\log4j-1.2.15.jar
................\...\lucene-core-2.4.0.jar
................\src
................\...\net
................\...\...\imdict
................\...\...\......\analysis
................\...\...\......\........\chinese
................\...\...\......\........\.......\AnalyzerProfile.java
................\...\...\......\........\.......\ChineseAnalyzer.java
................\...\...\......\........\.......\SentenceTokenizer.java
................\...\...\......\........\.......\WordTokenizer.java
................\...\...\......\........\stopword
................\...\...\......\........\........\StopDictionary.java
................\...\...\......\........\........\StringComparator.java
................\...\...\......\wordsegment
................\...\...\......\...........\dictionary
................\...\...\......\...........\..........\AbstractDictionary.java
................\...\...\......\...........\..........\BigramDictionary.java
................\...\...\......\...........\..........\WordDictionary.java
................\...\...\......\...........\hhmm
................\...\...\......\...........\....\BiSegGraph.java
................\...\...\......\...........\....\HHMMSegmenter.java
................\...\...\......\...........\....\PathNode.java
................\...\...\......\...........\....\SegGraph.java
................\...\...\......\...........\....\SegToken.java
................\...\...\......\...........\....\SegTokenFilter.java
................\...\...\......\...........\....\SegTokenPair.java
................\...\...\......\...........\util
................\...\...\......\...........\....\CharType.java
................\...\...\......\...........\....\Utility.java
................\...\...\......\...........\....\WordType.java
................\...\...\......\...........\WordSegmenter.java
................\test
................\....\net
................\....\...\imdict
................\....\...\......\analysis
................\....\...\......\........\test
................\....\...\......\........\....\AnalyzerTest.java
................\....\...\......\........\....\StringTest.java
................\test.txt