所属分类:
多国语言处理
开发工具:Visual C++
文件大小:5404KB
下载次数:81
上传日期:2009-06-22 09:16:57
说明: 针对高校教师的个人网页的源文件进行的正文提取,也可应用于一般的网页的正文提取。
(Colleges and universities for their personal web page of the source file to extract the body, but also can be applied to the general body of the page extraction.)
文件列表:
网页正文提取
............\正文提取改进
............\............\1.txt
............\............\2.txt
............\............\data
............\............\....\bigram.dat
............\............\....\extend_dict.dat
............\............\....\location_emit.dat
............\............\....\location_roles.dat
............\............\....\location_trans.dat
............\............\....\person_emit.dat
............\............\....\person_roles.dat
............\............\....\person_trans.dat
............\............\....\pos2givenword.dat
............\............\....\postag_emit.dat
............\............\....\postag_tags.dat
............\............\....\postag_trans.dat
............\............\....\unigram.dat
............\............\....\words.dat
............\............\Debug
............\............\.....\IRLAS_DLL.obj
............\............\.....\vc60.pdb
............\............\.....\正文提取改进.exe
............\............\.....\正文提取改进.obj
............\............\.....\正文提取改进.pdb
............\............\gongcheng.txt
............\............\IRLAS.dll
............\............\IRLAS.lib
............\............\IRLAS_config.ini
............\............\IRLAS_DLL.cpp
............\............\IRLAS_DLL.h
............\............\output
............\............\北京化工.txt
............\............\正文提取改进.cpp
............\............\正文提取改进.dsp
............\............\正文提取改进.dsw
............\............\正文提取改进.ncb
............\............\正文提取改进.opt
............\............\正文提取改进.plg