Hidden shadows of the internet – Using AI to detect malicious behavior on the web

While I was looking through the online profile of Prof. Jure Leskovec, I found very informative white papers on different applications of AI. One that most interested me was the tutorial ‘Malicious Behavior on the Web: Characterization and Detection by Srijan Kumar, Justin Cheng, Jure Leskovec. I found this tutorial very relevant and beneficial in the fact that it shares aspects of the internet that we encounter every day and might not be aware of. So, in this blog, I am going to give a simple non-technical summary of the tutorial. This article covers antisocial users of the web such as trolls, sock puppets and vandals. The second part of the article covers misinformation channels such as hoaxes, rumors and fraudulent reviews.

Anyone can create and share content and opinions on the web today in form of blogs, social media, wikis, reviews and forums. Information reported is commonly biased and twisted to achieve a certain goal; however not everyone has good intentions. Not everything you see online is true. On social media, people are exposed to fake news; without fact checking, a huge population ends up misinformed. I cannot count how many times the Queen Elizabeth or other celebrities have been ‘killed’ on internet. Fake news however, because of its ‘attractive content’ spreads faster than a forest fire through instant exponential shares.

What are some of the malicious behavior on internet and how can they be detected?

  1. Trolls

Hateful messages; mostly made as comments or reviews in order to attract attention, to redirect users, or spite arguments. It is true that the worst parts of humanity are revealed in comments. A good example of a troll would be ‘you get out of MY country, you f***ing a*******’. Research showed that trolls are sociopaths with particular personality traits such as sadism, psychopathy, narcissism and machiavellism; however it is not necessarily true as trolling can be induced by a negative environment.

Trolling can be detected by features such as:

  • frequent use of swearing words
  • high percentage of negative words
  • less similarity to neighboring comments
  • they get more replies than other comments
  • they escalate in negative conditions

Some suggestions to combat trolling are:

  • up-voting/down-voting – quality limited by bias and further longer term negative effects – negative feedback feeds the trolls
  • Bots to reduce harassment – ‘Influential figure’ chat bots employed to comment on trolls with positive messages

Takeaways:

  1. Sock puppets

Sock puppets are ‘peer accounts’ that post supportive content in a forum/discussion to create an illusion of support. They are characterized by:

  • Similar log in times, IP address and username
  • Similar writing styles and point of view
  • Supportive (echoing) content/voice

AI systems can identify sock puppets by:

  • Start fewer discussions
  • Interact and agree with each other more
  • Write shorter sentences
  • Address others directly
  • Write more self-centered posts
  • More opinionated
  • Similar username or email ids
  • Are down-voted more but up-vote each other more
  1. Vandals

These users deliberately alter content on the internet. They target websites that can be freely edited, have large reach and are depended upon by many users such as Wikipedia. 7% of edits on Wikipedia are vandalism. Vandalism can be detected by:

  • Vocabulary differences
  • Persistent editing behavior
  • They make visible edits
  • They edit very fast
  • They do not participate in discussions

In detecting vandals, use of a combination of metadata, text and human feedback gives the best results.

  1. Fake Reviews

I have fallen victim of this in my industry. Newly upcoming service providers pay users to create fake reviews about them online. Fake reviews are meant to increase conversion rates. Fake reviews are characterized by:

  • They are more opinionated
  • Reviewers give few and short reviews
  • They collude in many different clusters
  • They down-vote good products and up-vote bad products
  • They have linear targeted reviews

In detection of fake reviews, user behavior also plays an important role.

  1. Hoaxes

 This is deliberately fabricated content with the aim of misleading readers – disinformation. Hoaxes mainly come as fake news and claims that may be difficult to detect by casual readers. Good hoax creators can make genuinely credible content Hoaxes are characterized by:

  • Viral and quick spreading
  • More re-shares at greater depths
  • Long titles
  • Simple repetitive texts
  • Less links and references
  • Mentions are recent and from a similar IP address
  • Creators are newly registered and less experienced editors

 

0