[1]王素红,宁慧,杨松,等.基于SVM的抄袭检测方法研究[J].应用科技,2015,42(05):51-54,60.[doi:10.11991/yykj.201503013]
 WANG Suhong,NING Hui,YANG Song,et al.Research on plagiarism detection method based on SVMs[J].Applied science and technology,2015,42(05):51-54,60.[doi:10.11991/yykj.201503013]
点击复制

基于SVM的抄袭检测方法研究(/HTML)
分享到:

《应用科技》[ISSN:1009-671X/CN:23-1191/U]

卷:
第42卷
期数:
2015年05期
页码:
51-54,60
栏目:
计算机技术与应用
出版日期:
2015-10-05

文章信息/Info

Title:
Research on plagiarism detection method based on SVMs
作者:
王素红 宁慧 杨松 徐丽
哈尔滨工程大学 计算机科学与技术学院, 黑龙江 哈尔滨 150001
Author(s):
WANG Suhong NING Hui YANG Song XU Li
Department of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
关键词:
抄袭检测支持向量机信息检索特征提取
Keywords:
plagiarism detectionsupport vector machineinformation retrievalfeature extraction
分类号:
TP391
DOI:
10.11991/yykj.201503013
文献标志码:
A
摘要:
针对抄袭检测,本研究提出了基于信息检索和支持向量机(SVM)的检测方法,其对应的子任务就是备选文档检索和基于SVM的详细比对.首先,用信息检索系统从参考文档集中检索出与可疑文档对应的源文档,组成备选文档集.然后,对于可疑文档和备选文档组成的文本对<可疑文档,备选文档>进行特征提取,将得到的特征值写成向量的形式,用这些特征向量训练支持向量机分类器.最后,将从测试集中提取的特征向量投入到分类器中进行分类,从而预测可疑文档是否包含抄袭.实验表明,提出的研究方法能对文档进行有效的抄袭检测,并取得了较好的效果,精确率和召回率有了相应的提升.
Abstract:
For plagiarism detection, this paper proposes a plagiarism detection method based on information retrieval and support vector machines (SVMs), and its corresponding subtasks are candidate document retrieval and plagiarism analysis based on SVMs. First, the information retrieval is used to search source document corresponding to suspicious documents from the reference document set to compose the candidate document set. Then, for document pairs , features are extracted and expressed into vector which using to train classifier of the support vector machine. At last, feature vectors drawn from test corpus are put into the classifier for classification, so as to predict whether suspicious passages are plagiarized or non-plagiarized. Experiment results show that the proposed research method can effectively detect whether the document is plagiarized, and the result is satisfactory. The precision and recall are raised to a certain extent.

参考文献/References:

[1] SCHLEIMER S, WILKERSON D S, AIKEN A. Winnowing: local algorithms for document fingerprinting[C]//Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. New York, USA, 2003: 76-85.
[2] POTTHAST M, BARRÓN-CEDEÑO A, EISELT A, et al. Overview of the 2nd International Competition on Plagiarism Detection[C]// BRASCHLER M, HARMAN D eds. Notebook Papers of CLEF 2010 Labs and Workshops. Padua, Italy, 2010.
[3] MAURER H, KAPPE F, ZAKA B. Plagiarism-a survey[J]. Journal of Universal Computer Sciences, 2006, 12(8): 1050-1084.
[4] 李旭东. 计算机程序抄袭检测系统的设计方案[J]. 电脑知识与技术, 2012, 8(4): 799-800.
[5] WISE M J. YAP3: improved detection of similarities in computer program and other texts [C]//Proceedings of the 27th SIGCSE Technical Symposium on Computer Science Education. New York, USA, 1996: 130-134.
[6] SALTON G, WONG A, YANG C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
[7] JIFFRIYA M, JAHAN M A, RAGEL R G. Plagiarism detection on electronic text based assignments using vector space model[C]// /The 7th International Conference on Information and Automation for Sustainability (ICIAfS). Colombo, Sri Lanka, 2014.
[8] POTTHAST M, STEIN B, EISELT A, et al. Overview of the 1st International Competition on Plagiarism Detection[C]//STEIN B, ROSSO P, STAMATATOS E, et al. eds. EPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09), 2009: 1-9.
[9] JUAN D U, ZHANG W L, XIE M M. Research of a new SVM kernel function[J]. Applied Mechanics & Materials, 2014(543-547): 1659-1665.
[10] POTTHAST M, GOLLUB T, HAGEN M, et al. Overview of the 4th International Competition on Plagiarism Detection[C]//CLEF (Online Working Notes/Labs/Workshop). 2012.
[11] KONG Leilei, QI Haoliang, DU Suhong, et al. Approaches for candidate document retrieval and detailed comparison of plagiarism detection[C]// CLEF 2012 Conference and Labs of the Evaluation Forum. Rome, Italy, 2012.
[12] SUCHOMEL ?, KASPRZAK J, BRANDEJS M. Three way search engine queries with multi-feature document comparison for plagiarism detection[C]// CLEF 2012 Conference and Labs of the Evaluation Forum. Rome, Italy, 2012.
[13] GROZEA C, POPESCU M. Encoplot - Tuned for High Recall (also proposing a new plagiarism detection score) [C]// CLEF 2012 Conference and Labs of the Evaluation Forum. Rome, Italy, 2012.

相似文献/References:

[1]胡胜海,何蕾,徐鹏,等.基于灰色特征加权支持向量机的火炮质量评估[J].应用科技,2011,38(01):21.[doi:10.3969/j.issn.1009-671X.2011.01.006]
 HU Shenghai,HE Lei,XU Peng,et al.Quality evaluation of artillery based on grey feature weighted support vector machine[J].Applied science and technology,2011,38(05):21.[doi:10.3969/j.issn.1009-671X.2011.01.006]
[2]裴洪飞,闫保中.基于矩特征的目标识别[J].应用科技,2011,38(03):15.[doi:10.3969/j.issn.1009-671X.2011.03.004]
 PEI Hongfei,YAN Baozhong.Target recognition based on moment invariants[J].Applied science and technology,2011,38(05):15.[doi:10.3969/j.issn.1009-671X.2011.03.004]
[3]陈春雨,胡江.支持向量机在射频功率器件建模中的应用[J].应用科技,2011,38(03):42.[doi:10.3969/j.issn.1009-671X.2011.03.010]
 CHEN Chunyu,HU Jiang.The application of support vector machine in RF power device modeling[J].Applied science and technology,2011,38(05):42.[doi:10.3969/j.issn.1009-671X.2011.03.010]
[4]王健峰,张磊,陈国兴,等.基于改进的网格搜索法的SVM参数优化[J].应用科技,2012,39(03):28.[doi:10.3969/j.issn.1009-671X.201112016]
 WANG Jianfeng,ZHANG Lei,CHEN Guoxing,et al.A parameter optimization method for an SVM based on improved grid search algorithm[J].Applied science and technology,2012,39(05):28.[doi:10.3969/j.issn.1009-671X.201112016]
[5]刘佰明,李东.一种基于数据挖掘的隐私保护方法[J].应用科技,2012,39(05):48.[doi:10.3969/j.issn.1009-671X.201205013]
 LIU Baiming,LI Dong.Privacy preserving method based on data mining[J].Applied science and technology,2012,39(05):48.[doi:10.3969/j.issn.1009-671X.201205013]
[6]谢守志,张磊.基于Fisher权重改进的OB场景分类方法[J].应用科技,2014,41(02):21.[doi:10.3969/j.issn.1009-671X.201301011]
 XIE Shouzhi,ZHANG Lei.An improved OB scene classification method based on Fisher weight[J].Applied science and technology,2014,41(05):21.[doi:10.3969/j.issn.1009-671X.201301011]
[7]张春杰,龚再兰,任黎丽.基于修正的Rife和SVM的辐射源特征提取和识别[J].应用科技,2015,42(03):7.[doi:10.3969/j.issn.1009-671X.201403021]
 ZHANG Chunjie,GONG Zailan,REN Lili.Emitter feature extraction and recognition based on the modified Rife and SVM[J].Applied science and technology,2015,42(05):7.[doi:10.3969/j.issn.1009-671X.201403021]
[8]王素红,宁慧,王明星,等.基于Hadoop的抄袭检测的源检索方法研究[J].应用科技,2015,42(06):67.[doi:10.11991/yykj.201503030]
 WANG Suhong,NING Hui,WANG Mingxing,et al.Research on the source retrieval method of plagiarism detection based on Hadoop[J].Applied science and technology,2015,42(05):67.[doi:10.11991/yykj.201503030]
[9]宁慧,王素红,王明星,等.基于图论的片段合并方法研究[J].应用科技,2016,43(01):40.[doi:10.11991/yykj.201505017]
 NING Hui,WANG Suhong,WANG Mingxing,et al.A passage merge algorithm based on graph theory[J].Applied science and technology,2016,43(05):40.[doi:10.11991/yykj.201505017]
[10]梁明明,沈柳笛,王伊雪,等.基于SVM的眼底血管分割技术[J].应用科技,2017,44(03):67.[doi:10.11991/yykj.201605020]
 LIANG Mingming,SHEN Liudi,WANG Yixue,et al.Eye fundus vessel segmentation technology based on SVM[J].Applied science and technology,2017,44(05):67.[doi:10.11991/yykj.201605020]

备注/Memo

备注/Memo:
收稿日期:2015-3-17;改回日期:。
基金项目:国家自然科学基金资助项目(61201084).
作者简介:王素红(1986-),女,硕士研究生;宁慧(1964-),女,副教授,硕士生导师.
通讯作者:宁慧,E-mail:ninghui@hrbeu.edu.cn.
更新日期/Last Update: 2015-10-20