|||
情 报 学 报
ISSN 1000-0135
第30卷 第8期787-795,2011年8月
JOURNAL OF THE CHINA SOCIETY FOR SCIENTIFIC AND TECHNICAL INFORMATION
ISSN 1000-0135
Vol.30 No.8,787-795
1. Evolution of Knowledge Networks (I):Growth and Obsolescence
Ma Feicheng and Liu Xiang
(Center for Studies of Information Resources of Wuhan University, Wuhan, 430072)
Abstract: For the analysis of topology and obsolescence of knowledge under different growth patterns, we constructed an evolution model of knowledge networks. A general function analysis and simulation had been done, it was found that the degree distribution had no relation to the growth pattern and the diachronic citation of vertex decreased, when the growth ratio of knowledge was convergence function, such as liner function; On the other hand, the exponent of degree distribution was small and the diachronic citation was increased when the growth function was transpire function, like exponent function. The faster the rapid of growth, the flatter the degree distribution it was, which also means the higher efficiency of knowledge utilization.
Keywords: knowledge, complex networks, evolution, obsolescence, model
2. Collaborative Recommendation Using Smoothing Clustering Based on User Information Matrix
Chang Fuyang, Xu Kan and Lin Hongfei
(School of Computer Science and Technology, Dalian University of Technology, Dalian 116024)
Abstract: Collaborative recommendation technology can help people find something interesting in the e?commerce business field. In collaborative recommendation, there is a common way to generate recommendation called nearest neighbor method. With the increase of commodity quantity, the ratio of useful data is decreasing. In order to solve the sparse problem, we collect and discrete user information on the basis of ordinary score data, then we convert user information to a 0?1 vector. We compute the N?nearest neighbors from the user information matrix and smooth the it using the k?NN. We cluster the user rating matrix to predict the score. The experiment results show that the approach of rating and discretion the user information can improves the predicting score precision.
Keywords: collaborative recommendation, user information, data smoothing, rating cluster
3. Contextual Recommendation?oriented User Preference Drift Recognition Based on Hypergraph Model
Cai Shuqin1, Hu Muhai2, Ye Bo3 and Ma Yutao1
(1.Institute of Enterprise Business Intelligence Engineering, Huazhong University of Science and Technology, Wuhan 430074;
2.Management School, Wuhan Textile University, Wuhan 430074; 3.Guangxi Technology Information Net Center,Nanning 530012)
Abstract: User preference drift recognition is one of the keys to update user profile and keep the description precision of users preference. With the quick development of mobile commerce, such recognition was paid great attention recently. However, most of researches based on clustering are insufficient for the treatment of item objects where weak N-ary associations exist. In this paper, through the analysis of contextual recommendation, a hypergraph model of contextual items is proposed, and the similarity between a pair of items, a pair of item clusters and user preference drift degree are defined. Based on above related definitions, a method to measure preference drift is constructed which is based on two stages hierarchical clustering framework and in combination with Multilevel k-way Hypergraph Partitioning arithmetic. Finally the time complexity and application mechanism of the method are discussed, the usefulness of the method is also verified by two groups of experiments.
Keywords: hypergraph, contextual recommendation, preference drift
4. Fuzzy Clustering Model and Algorithm Based on Rate Distortion Theory
Guo Chonghui and Zhang Yanchang
(Institute of Systems Engineering, Dalian University of Technology, Dalian 116024 )
Abstract: Clustering is considered as a process of lossy compression from an information theory perspective in this paper. Firstly an optimization model of fuzzy clustering is built by using the rate distortion theory. Comparing to the classic fuzzy clustering model, the new model introduces a new index in the objective function which describes the complexity of clustering process. In order to estimate the number of clusters, a new cluster validity index is also proposed. Then the fuzzy clustering algorithm based on rate distortion theory is obtained by solving the optimization model. Finally some numerical experiments are made to compare the fuzzy clustering algorithm based on rate distortion theory with fuzzy c?means. The experimental results indicate that the fuzzy clustering algorithm based on rate distortion theory can estimate the number of clusters automatically and it also has less running time than fuzzy c?means. Moreover, membership assignments of the proposed algorithm based on rate distortion theory are less confused than fuzzy c-means, which makes the result more definite and reliable.
Keywords: fuzzy clustering, rate distortion theory, mutual information,number of clusters
5. Study on Clustering of Retrieval Results Based on Co-occurrence Analysis of Keywords
Li Fenglin and He Zhoufang
(Center for Studies of Information Resources of Wuhan University, Wuhan 430072)
Abstract: The continuous growth in the size of the Internet is creating difficulties for improving efficiency of information retrieval. First of all, this paper extracts the keywords from each document through a specific algorithm. Secondly, it has applied statistical techniques to measure the quantities of co?occurrence keywords for forming the label matrix of them, and finally agglomerated them into higher?level clusters by hierarchical clustering algorithm in order to classify the results which return from the source research engine. The view of retrieval results clustering can help the user quickly and efficiently navigate the results of a query at a topic level and locate the relevant information. Compared with Lingo, the experimental results show that the labels generated by our algorithm are of more readability and generality. What’s more, F?measure index also shows that our algorithm has improved the quality of text clustering to some extent.
Keywords: keywords, co-occurrence, clustering, retrieval results
6. A Novel Classification Model Based on Relaxed Conservative Inference Rule for Incomplete Data
Qi Ruihua1,2 and Yang Deli2
(1.Modern Education Technology Center, Dalian University of Foreign Language , Dalian 116044;
2.Institute of System Engineering, School of Management, Dalian University of Technology, Dalian 116024)
Abstract: To solve the problem of declining proportion of clear samples in the tofal when using Naive Credal Classifier, this paper improves conservative inference rule, and proposes an incomplete data classification model based on relaxed conservative inference rule. Simulation results of comparative experiment with Naive Bayesian Classifier and Naive Credal Classifier verify the effectiveness of this classification model. Besides, the style identification as the application background, comparative experimental results further show that this classifier has better overall performance on the style identification data set.
Keywords: classification, incomplete data, interval advantage
7. Research on Semantic Text Mining Based on Domain Ontology
Zhang Yufeng and He Chao
(Center for Studies of Information Resources of Wuhan University, Wuhan 430072)
Abstract: In order to improve the depth and accuracy of text mining, a semantic text mining model based on domain ontology is proposed. In this model, semantic role labeling is applied to semantic analysis so that the semantic relations can be extracted accurately. For the defect of traditional knowledge mining algorithms that can not effectively mine semantic meta database, an association patterns mining algorithm based on semantic is designed and used to acquire the deep semantic association patterns from semantic meta database. Experimental results show that the model can mine deep semantic knowledge from text database. The pattern got has great potential applications, and the algorithm designed has strong adaptability and scalability.
Keywords: semantic text mining, domain ontology, semantic patterns
8. Application Research on Improved PSO Algorithm for Data Prediction Mining
Wang Xiaojia, Yang Shanlin and Xu Dayu
(Hefei University of Technology, Key Laboratory of Process Optimization and Intelligent Decision?making,
Ministry of Education, Hefei 230009)
Abstract: In order to solve the problem of prematurity and tendancy to fall into local convergence in particle swarm optimization algorithm, this paper proposed an improved particle swarm optimization algorithm that is able to overcome prematurity. Extreme disturbances and adaptive adjustment factor were added to the standard PSO algorithm. Making the algorithm can jump out of local optimum easily. It also analyzed the limitations of gray model GM (1,1).So a selfadaptive PSO algorithm with disturbed extremum called AdPSO is presented. Utilizing the new model for data mining prediction. Finally, an example is used to validate the proposed method. Example shows that this model has higher prediction accuracy.
Keywords: particle swarm optimization algorithm, GM(1,1) model, AdPSO-GM model, forecasting mining
9. A Tag Ranking Method Based on HITS and Random Walk
Wang Zhaopeng1, Hu Xia2, Ni Ning3 and Wang Can1
(1.College of Computer Science, Zhejiang University, Hangzhou 310027;
2.Hangzhou Science and Technology Information Research Institute, Hangzhou 310001;
3.Information Technology Department, Zhejiang Vocational College of Commerce, Hangzhou 310012)
Abstract: With the rise of Web 2.0 applications, a new trend in information science, namely the evolution from “organizing document” to “organizing knowledge”, is looming on the horizon. One of important Web 2.0 applications, social tag, is making this trend a reality by adding meaningful annotations to Web pages. However, existing tag ranking methods are not efficient in knowledge organization. To improve tag ranking performance, this paper proposes a new ranking algorithm by utilizing relationships among users, tags and Web documents in a tripartite collaborative tagging model. By combining HITS and random walk, we effectively exploit the mutual reinforcement between quality users and quality tags and retrieve related tags by measuring similarity between tags. Experimental results on Delicious dataset demonstrate the effectiveness of our algorithm.
Keywords: tag, ranking, HITS, random walk
10. Research on the Recognition of Business Organizations Names in Internet
Zhao Jie1,2, Liu Yanhong3 and Jin Peiquan3
(1.School of Business Administration, Anhui University, Hefei 230029;
2.School of Management, University of Science and Technology of China, Hefei 230026;
3.School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026)
Abstract: Internet has been one of the major sources for enterprises and organizations to acquire competitive intelligence. And many enterprises have shown urgent requirements on building a Web?based system to acquire competitor intelligence. In such a Web?based competitor intelligence system, a fundamental issue is to recognize business organizations’ names in Internet, because it is the basis of identifying competitors and extracting further intelligence from the Web. In this paper, we present a new approach to recognizing business organizations in Internet, which considers the semantic relationship between business organizations’ names and their context in Web pages and recognizes organizations’ names based on an integration of semantic annotation and the Hidden Markov Model (HMM). We conduct an experiment on a real dataset consisting of a large number of Chinese Web pages and evaluate the performance of our approach as well as three competitor algorithms including CHMM, MEM, and SVM, with respect to recall, precision, and F?measure. The results show that our new approach improves the effectiveness of the reorganization of business organizations’ names. Meanwhile, it is a general?purposed algorithm and can suit different types of tasks on business organizations recognition.
Keywords: competitive intelligence, internet, business organization, hidden Markov model
11. Network Characteristics of Chinese Scientific and Technical Vocabulary System
Hu Changai and Zhu Lijun
(Institute of Scientific and Technical Information of China, Beijing 100038)
Abstract: The evaluation of the traditional knowledge organization systems lacks of macroeconomic measure, works out very late, and cann’t reveal dynamic process. Based on complex network theory, this paper has analyzed the network properties of the Chinese science and technology vocabulary system from the aspects of basic characteristics, dynamic characteristics and robustness. The analysis of basic characteristics shows that the Chinese science and technology vocabulary system is a small world and scale free network, and it has good connexity. But there are errors and multiplicity; The analysis of dynamic property shows that the system is much more small?world, while the network's performance should be further improved. The analysis of robustness shows that the system is much robust. So that the suggestion has been advanced that the vocabulary system should be constructed under the direction, and the importance should be attached to each and every vocabulary to ensure the connectivity of the system.
Keywords: Chinese scientific and technical vocabulary system, complex network, small world, scale?free, robustness
12. Research on Personalized Cross-language Academic Search
Pang Guansong1, Zhang Lisha2 and Jiang Shengyi3
(1.School of Management, Guangdong University of Foreign Studies, Guangzhou,510006;
2.Cardiff Business School, Cardiff University, CF10 3EU, Cardiff, United Kingdom;
3.School of Informatics, Guangdong University of Foreign Studies, Guangzhou,510420)
Abstract: The academic search engine is a domain?oriented search engine. However, due to its lack of personalized services, there appeared the problem of inefficiency in literature retrieval and insufficient usage of massive digital academic resource. This paper employs Google translation, presents a Chinese, English, Russia, French and Spanish cross?language academic search engine based on machine translation. On the foundation of cross?language academic search, we research on personalized information retrieval techniques, propose a personalized information retrieval approach based on clustering: based on the click behavior of the clusters achieved by search results clustering, generates and updates user real?time profile, employs cosine formula compute the similarities between the user real?time profile and search results, finally personalized resorts the search results based on the similarities. The experimental results show that the proposed approach has its effectiveness and users acceptance.
Keywords: cross-language information retrieval, personalized information retrieval, one pass clustering, academic search,user click behavior
13. Network Based Users Book?Loan Behavior Analysis:A Case Study of Peking University Library
Yan Fei1, Zhang Ming1, Sun Tao1 and Xiao Long2
(1.School of EECS, Peking University, Beijing 100871; 2. Library of Peking University, Beijing 100871)
Abstract: Book loan is the most important service in libraries. Taking Peking University as an example, almost every student has borrowed books from the library. Hence, it is essential to understand users′ book?loan behaviors, and provide better user?oriented services based on the understandings. There exist two kinds of networks in libraries: book?borrowing network and co?borrowing network. In the book?borrowing network, a user and a book are connected if the user borrowed the book. Meanwhile, in the co?borrowing network, two users are connected if they borrowed same books. The latter can also be regarded as a knowledge sharing network. In the paper, we analyze users′ book?loan behaviors in these two networks, gain new understandings from users′ behaviors, and apply the analysis results to promote library services. Our research exactly goes as the trend of Library 2.0.
Keywords: user behavior analysis, social network analysis, digital library, log mining
14. Visualization Analysis of the Research Fronts Based on CiteSpaceII
Yang Liangxuan, Li Zili and Wang Hao
(School of Information System and Management,National University of Defense Technology,Changsha 410073)
Abstract: Research Fronts is emerging thematic trends and surges of new topics, according to find out the research fronts, can provide for the researchers the newest information which he or she wants to know. Based on this, first, we introduced the existed methods to detect the research fronts in briefly; and then, used the visualization software CiteSpaceII to description the relationships of co?citation, the critical articles and important researchers would be listed. In addition, we also discussed the shortage of the software, and forecast the future application of the research fronts.
Keywords: research fronts, fronts analysis, CiteSpaceII
15. Study on Focuses in Pervasive Computing Based on National Top level Domains
Huang Lucheng and Zhao Pan
(School of Economics and Management, Beijing University of Technology, Beijing 100124)
Abstract: With the development of network technology and the massive data, the network information has become the community can not ignore the important information resource, which not only contains a lot of information and technology related, but also subtle to guide and influence technology development. According to status of current research in this area, we propose idea analyzing focus technology and development direction in future using national domain name in Top?level domain based on Internet information. Using the hierarchical clustering method, through comparing countries in one group and between groups, we can discover various countries current universally concerned hot technology and technology development direction in future, and use pervasive computing technology as an example for empirical analysis.
Keywords: national Top?level domain, hot technology, cluster analysis, pervasive computing technology
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-23 01:18
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社