《镜子大全》《朝华午拾》分享 http://blog.sciencenet.cn/u/liwei999 曾任红小兵,插队修地球,1991年去国离乡,不知行止。


Social media mining on credit industry in China

已有 3688 次阅读 2014-9-21 02:56 |个人分类:社媒挖掘|系统分类:科研笔记| Social, Mining, credit, text, media

The purpose of this investigation is to collect the public opinions from Chinese social media on one of the most important industries in the financing world of China: Credit Card and its associated issues.  Name brands such as Ali Pay, and Citi Bank are analyzed in this context.

We all know China has seen continuous economic growth in the last three decades, unprecedented in human history.  Just about 15 years ago, the Chinese people rarely heard of credit cards, online payment and personal credits: everything looked so remote and most all transactions were in cash only.   Look at today.  Look at the incredible IPO of Alibaba, who (among others) helped build the concept and practice of credits and online payment by a client base involving over a billion people.    

Chinese market is important for international banks too. Hence an accurate Chinese social media analysis on critical financial topics of credit cards will help them in their Chinese business as well.  This study using real life automatic mining of social media big data shows that we have the state-of-the-art Chinese NLP (Natural Language Processing) technology that really works, in fact it is the only real-life fully automatic Chinese deep analysis system scaled up to the entire social media available in industry.  Despite the anarchy and all kinds of jargon and ungrammaticality in Chinese social media, our system is able to make the best sense of the massive data to uncover true intelligence behind, including public opinions and sentiments and, more importantly,  the underlying motivations behind the opinions/sentiments.  The exercise below demonstrates a flavor of that.

We have defined  two related category topics for study: credit card (信用卡) and credit card fraud (信用卡欺诈, including all types of security issues).  It is believed that these are topics that are of general interest to people in the financing world.

The above summary represents data over the past 1-year Chinese social media from 9/15/2013 up to 9/15/2014, which has very limited Weibo data due to the data cost constraints, but includes almost all other Chinese social media sources such as 天涯,豆瓣,百度帖吧,淘宝, etc, excluding WeChat (微信) due to its being largely private data, not open to anyone for public mining and analysis (fortunately or unfortunately).  As it shows, the topic “credit card” is mentioned 1.4 million times and“credit card fraud” 139k times, about 1/10 of the former topic.  It shows that fraud is indeed a significant subtopic with credit cards which people are concerned about.

Also noticeable in the summary is the associated net-sentiment measures (a metric representing the ratio of positive comments versus negative comments, an indicator of the public image of a brand or topic in people's mind as represented by social media): 28% for “credit card” and -41% for “credit card fraud”.   Based on our past metrics on different brands and topics, 28% is fair for a neutral category topic and it shows that people still like and adopt credit cards despite some concerns related to them.  -41% is a very negative net sentiment for a topic, which is natural in this case because the fraud topic itself is a negative thing we are investigating.

In the Timeline trends graph above, we can see the topics' ups and downs over the year in Chinese social media.  Looks like near the end of 2013 and around March and April 2014, the topic was hot.  We can drill down to show what events caused the spike of the topic in social media at those times, if needed.

The next graph on Crosstab shows our association analysis of the category topics with some known brands we chose to investigate:

支付宝 (Ali Pay, Alibaba’s famous payment system,China’s Paypal)
建行(China’s Construction Bank)
Citi (花旗)
HSBC (汇丰)
Deutsche Bank (德银).  

The category association analysis gives a quick view on how serious an issue is associated with a brand and how one brand is compared with other brands for that issue.  If an issue is serious, we should drill down to analyze what is going on behind the numbers.  There are tools and widgets handy in our system to help with all kinds of drill-down to the relevant data at will and a variety of ways of looking at the data from different perspectives with different constraints to reveal the cause-effect or other insightful relationships.  

For the credit fraudulent topic, the table below shows that the two Chinese brands are deeply involved in the issue, with more concerns on Alibaba’s system.  More specifically, we have 5k mentions related to some type of fraud out of 67.7k topic data for Alibaba’s payment system Ali Pay and 1.8k mentions out of over 100k topic data for China Construction Bank.  This makes sense as Alibaba’s system handles online payment exclusively, with so many transactions, by so many online stores, that it seems more subject to fraud events.  As for Citi, the situation is not bad, 166 mentions out of 6990 credit topic data; this is very comparable to HSBC,  114 mentions out of  5629 topic data.  That is the overall picture of the issue in comparing brands.

The next two graphs are Word Clouds on the themes and emotions related to the topics.  The major sentiments on credit card are positive, many Chinese consumers talk about “support”(支持), “like”(喜欢), “use”(用), “trust”(信赖) , and ”enjoy”(享受) with regards to credit cards, and generally regard them as “good stuff” (好东西), the negative sentiments are far less, including“NOT support” (不支持), “NOT accept” (不接受) and “does not work” (不行).  The subtopic “credit card fraud” is associated, quite naturally, with lots of “worry” (担心), “[handled] not well” (不善), “high risks” (危险) and  “issues” (问题).

Different from  many other teams who claim to do sentiment analysis, our system does not just mine emotional sentiments, we can reveal reasons behind sentiments as well: why people like or dislike something.  This type of insights are far more complicated as there are thousands of reasons although there are only a couple of major sentiment types such as positive or negative (or neutral) and maybe a dozen sub-types such as hate, anger, disappointment, love, like, thankfulness, or mixed feelings.  However, the uncovered reasons and motivations behind the sentiments are far more valuable and actionable for business decision making.  This is shown in our Likes and Dislikes clouds and pie-charts shown below.  There are lots of interesting insights here and some may be worth drilling down for further analysis using our tool.  Let us focus on the top insights.

From the pie charts, we see the top reasons why people like credit cards are:  方便(convenience),优惠(promotions),行(works). The top dislikes are:  被盗(stolen), 逾期 (pass deadline), 诈骗(fraud), 费(fees), 伪造(fake).  These all seem to be common sense.  The point is that these factors can change in time and order, reflecting the social sentiments and consumers’ opinions and concerns at the time.  For example, “promotions (优惠)” are almost equally important as“convenience” in consumers’ social talk as top reasons for using credit cards,this gives confirmation that the incentives in credit card promotion campaigns must have worked and there are good reasons to keep promoting.  On the negative side, we see almost 50% of the top 10 dislikes are related to some type of fraud and about 30% related to concerns of fines and fees. This type of insight and comparison are exactly what credit card companies are looking for,, who need to address such concerns in order.

In general, the results look really impressive and the quality is good.  We can drill down to details interactively in our live demo if interested.




2 曹聪 刘全慧

该博文允许注册用户评论 请点击登录 评论 (0 个评论)


Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2020-8-10 14:28

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社