Illicit Promotion on Twitter (2024)

Hongyu Wang1*, Ying Li2*, Ronghong Huang1, Xianghang Mi1
1 University of Science and Technology of China   2 University of California, Los Angeles
https://illicit-promotion.github.io/

Abstract.

In this paper, we present an extensive study of the promotion of illicit goods and services on Twitter, a popular online social network(OSN). This study is made possible through the design and implementation of multiple novel tools for detecting and analyzing illicit promotion activities as well as their underlying campaigns. As the results, we observe that illicit promotion is prevalent on Twitter, along with noticeable existence on other three popular OSNs including Youtube, Facebook, and TikTok. Particularly, 12 million distinct posts of illicit promotion (PIPs) have been observed on the Twitter platform, which are widely distributed in 5 major natural languages and 10 categories of illicit goods and services, e.g., drugs, data leakage, gambling, and weapon sales. What are also observed are 580K Twitter accounts publishing PIPs as well as 37K distinct instant messaging (IM) accounts that are embedded in PIPs and serve as next hops of communication, which strongly indicates that the campaigns underpinning PIPs are also of a large scale. Also, an arms race between Twitter and illicit promotion operators is also observed. On one hand, Twitter is observed to conduct content moderation in a continuous manner and almost 80% PIPs will get gradually unpublished within six months since posted. However, in the meantime, miscreants adopt various evasion tactics to masquerade their PIPs, which renders more than 90% PIPs keeping hidden from the detection radar for two months or longer.

* These authors contributed equally to this research.

copyright: noneconference: ACM Conference on Computer and Communications Security; Due 14 Jan 2022; Los Angeles, U.S.Ajournalyear: 2022

{CJK*}

UTF8gbsn

1. Introduction

Traditionally, illicit goods and services are promoted either offline or through anonymous online marketplaces(Soska and Christin, 2015), e.g., the Silk Road that runs as an onion service. However, these channels tend to have a very constrained audience base(Christin, 2013), and are thus not feasible to promote illicit products to large-scaled regular online users. To reach a wider customer base especially regular online users, alternative promotion techniques have been developed and adopted by miscreants. One typical example is the search engine poisoning attack(John etal., 2011; Liao etal., 2016a) which involves the injection of illicit promotion into benign but vulnerable websites as well as misleading search engines to index the poisoned webpages with high page rank. However, since a compromised website can be recovered quickly(Leontiadis etal., 2014), the miscreants have to continuously identify and compromise new websites so as to maintain the magnitude of their promotion campaigns.

Illicit Promotion on Twitter (1)
Illicit Promotion on Twitter (2)

Instead, illicit promotion on online social networks (OSNs) is traditionally considered as either infeasible or uncommon, since OSNs typically enforce strict content moderation against accounts and posts.However, it is not the case anymore. In this study, we observe that posts of illicit promotion (PIPs) are being distributed at a concerning scale on Twitter, a major online social network platform. Figure1 presents two PIP examples published on the Twitter platform. One post (Figure1(a)) is intended to promote drug trading. In the attached image, we can see some hand-written Thai words and several bags of products that look like heroin. And the other post (Figure1(b)) advertises the service of forging certificates and photo IDs. In this post, the main text is masqueraded as a benign English sentence, while both the Chinese username and the attached images clearly promotes a fake certificate service.Motivated by such examples, this study aims to gain an in-depth understanding of illicit promotion on Twitter.And, we focus on the following research questions. First of all, what illicit goods and services are being promoted on Twitter? Besides, how can posts of illicit promotion evade content moderation of Twitter and get published? Also, how do the underlying operators of illicit goods and services communicate with the victims (potential customers) as exposed to their PIPs?

To fulfill these research questions, multiple technical challenges must be tackled. Firstly, given a limited access to the Twitter platform, it is challenging to identify PIPs with a good coverage. Also, there are no existing tools that can accurately distinguish PIPs from benign Twitter posts (i.e., tweets), not to mention classifying PIPs into well-known categories of illicit goods and services. Furthermore, cybercrime operators tend to embed in PIPs the contacts to facilitate the next-step communication with victims. However, such contacts are of diverse categories and are typically presented in an evasive manner that renders automatic recognition error-prone while keeping them still human-readable. We have addressed these technical challenges through two novel tools. One is the PIP hunter which can not only efficiently search the Twitter platform with known PIP keywords (e.g., relevant hashtags), but also classify whether a given tweet is a PIP or not using machine learning, as well as snowballing this hunting process by automatically generating novel PIP-relevant keywords. The other tool is designed to gain an in-depth understanding of captured PIPs and their underlying campaigns. We thus name it as the PIP analyzer. It consists of a multiclass classifier to classify PIPs into well-defined categories of illicit goods and services, an PIP contact extractor based on named entity recognition (NER), as well as a clustering module designed to group PIPs into their underlying campaigns of illicit goods and services.Leveraging this novel toolchain, we have conducted an extensive study on illicit promotion on Twitter. Below, we highlight the key findings of this study.

First of all, illicit promotion on Twitter features a large scale, diverse categories and products, and a wide distribution across natural languages and Twitter accounts.Specifically, in total, we have captured 12,401,082 distinct PIPs as well as 580,530 Twitter accounts. These Twitter accounts either publish PIPs (i.e., PIP accounts) or promote illicit goods and services in their personal profiles. Besides, The captured PIPs are widely distributed in multiple natural languages and reside in 10 well-defined categories of illicit goods and services. The top categories with most PIPs are p*rn & sex services (69.78%), gambling (13.34%), illegal drugs (8.04%), money-laundry (4.09%) and data leakage(2.03%). Also, from PIPs in each category, a diverse set of specific products have been observed, e.g., methamphetamine and marijuana in the category of illegal drug, and ID cards and passports in the category of data theft and leakage.

Besides, illicit promotion is not limited to Twitter, but turns out to be a cross-OSN issue.Particularly, we have observed the extensive distribution of PIPs on other three popular OSN platforms including Facebook, TikTok, and Youtube. Particularly, when searching these OSNs with sampled PIP keywords that cover all the ten categories, 35% keywords have one or more PIPs returned as top search results on Youtube, while it is 83% on Facebook, and 52% on TikTok. Also, some PIPs observed across OSNs have the same contacts embedded, which suggests that they belong to the same underlying campaign and some miscreants are promoting their illicit goods and services across OSN platforms. Such an observation also highlights the necessity of cross-OSN collaboration in terms of mitigating illicit promotion.

Then, an arms race is observed between illicitpromotion campaigns and Twitter content moderation. On one hand, various evasion techniques have been adopted by PIP operators, e.g., the use of various jargon words, composing PIPs with multilingual characters, and masquerading the tweet text as benign but injecting illicit promotion elements into usernames, media files, or even poll options. The adoption of diverse evasion techniques may explain why over 90% PIPs can survive the first two months since published. On the other hand, Twitter carries out continuous content moderation and almost 80% PIPs have got banned sixmonths later after being published, as learned through periodically revisiting captured PIPs and checking their availability.

Furthermore, when it comes to further communication with victims exposed to PIPs, most PIP operators prefer instant messaging platforms rather than Twitter itself, especially end-to-end encrypted ones. As the results, we have extracted from PIPs, 37,621 instant messaging accounts which include 9,644 Telegram accounts, 11,561 WeChat accounts, 12,702 QQ accounts, 225 WhatsApp accounts, and 3,489 LINE accounts. Also, miscreants underpinning PIPs of different categories appear to vary a lot in terms of their preferred instant messaging platform, e.g., operators of money laundering or weapon sales prefer Telegram accounts while data leakage services prefer WeChat.

Our contributions can be summarized as below.

\bullet To the best of our knowledge, this is the first extensive security study of illicit promotion on the Twitter platform, which have distilled a set of previously unknown security findings.

\bullet Two novel tools have been designed and implemented to capture and analyze illicit promotion, along with a large dataset of PIPs and PIP contacts captured. We have made the tools anddatasets available to the public to promote research in this domain111Available in https://illicit-promotion.github.io/.

2. Background and Related Works

The promotion and communication of the underground economy.The underground economy, or the shadow economy refers to the production, promotion, trade, and distribution of goods or services that are deemed illicit or even illegal. It ranges from traditional categories (such as drug trading and child p*rnography), to new ones such as hacking services and the trading of illegal data(Du etal., 2018). In many studies, the underground economy is called cybercrime since it has many activities that either have moved to the Internet or owe their existence to the Internet, and we use both terms interchangeably in this study.

A long line of studies have profiled the underground economy from various aspects, among which, a large portion are dedicated to a single cybercrime category, e.g., counterfeit or unlicensed pharmaceuticals(Kanich etal., 2011; Leontiadis etal., 2011; McCoy etal., 2012), drug trading(Aldridge and Askew, 2017; Li etal., 2021),illegal online gambling(Yang etal., 2019; Gao etal., 2021), malware distribution(Caballero etal., 2011; Kotzias etal., 2016; Thomas etal., 2016; Kotzias etal., 2021), among others.Another line of work examines holistic infrastructures that promote diverse categories of illicit goods and services. Through these studies, various promotion channels have been identified and profiled. One prominent example is the anonymous markets that run as Tor hidden services. Christin et al.(Christin, 2013) conducted a measurement study between 2011 and 2012 on such an anonymous marketplace, namely the Silk Road. The Silk Road marketplace was found to be mostly used in the trading of controlled substances and narcotics. Also, most items were on sale for less than three weeks, and only 564 sellers and 24,000 unique items were observed.Furthermore, Soska et al.(Soska and Christin, 2015) moved forward and profiled the longitudinal evolution of anonymous marketplaces between 2013 and 2015. Still, most sales were found to be drug-related.

To reach a wider customer base, especially regular Internet users, miscreants have also abused or even compromised popular online services. One typical and well-studied example is promotional infections or search poisoning attacks(John etal., 2011; Leontiadis etal., 2014; Liao etal., 2016a) wherein the attacker compromises a legitimate website, injects promotional and harmful webpages, induce the search engines to index these webpages with high page ranks, and ultimately expose benign search users to the injected harmful webpages. Such promotional infections have been used to promote diverse cybercrime activities, e.g., online gambling(Liao etal., 2016a), unlicensed pharmacies(Leontiadis etal., 2014), etc. As revealed by (Leontiadis etal., 2014), the median time to recover such promotional infections is around 15 days. To detect promotional infections, John et al.(John etal., 2011) proposed the detection of infected webpages by looking at its URL parameters instead of visiting the webpage. Besides, Liao et al.(Liao etal., 2016a) utilized the semantic inconsistency between the injected cybercrime content and the legitimate context of the infected website to decide if a webpage is a promotional infection or not.

Besides, to gain a deep understanding of the underground economy, an important obstacle exists in the use of jargon words among sellers and buyers of illicit goods and services. Some research efforts(Zhao etal., 2016; Yang etal., 2017a; Yuan etal., 2018; Zhu etal., 2021) have thus been invested to explore how to identify and understand such kinds of jargon words. Particularly, Yang et al(Yang etal., 2017a) explored identifying jargon words from search engine keywords as promoted in black hat SEO campaigns, while Yuan et al.(Yuan etal., 2018) detected whether a word is a jargon or not by comparing the semantic discrepancy between cybercrime contexts and benign ones for a given word.

Moving forward from these studies on the the underground economy, we have extensively explored, for the first time, how miscreants promote their illicit goods and services on popular online social networks, as well as how to address this concerning online abuse problem.

Spam in online social networks.The widespread adoption of online social networks (OSN) is paralleled by various spamming activities which include but not limited to unsolicited advertisem*nts, scams, malware distribution spams, etc. To fight against such spamming activities, different detection methods have been developed and evaluated to detect spam messages(Gao etal., 2012; Xu etal., 2016; Zheng etal., 2019), spam accounts (spammers)(Benevenuto etal., 2010; Yang etal., 2013; Miller etal., 2014), and spam URLs(Thomas etal., 2011a). Particularly, Gao et al.(Gao etal., 2012) considered spam detection as a two-step task in which messages are first grouped into clusters and the resulting message clusters are further classified as spam campaigns or not. Instead, Miller et al.(Miller etal., 2014) adopted anomaly detection methodologies for spam detection. In addition to spam detection, many studies(Grier etal., 2010; Thomas etal., 2011b, 2013) have also distilled novel findings regarding spam activities on OSNs.Particularly, Grier et al.(Grier etal., 2010) revealed in 2010 the distribution of spam URLs on Twitter, and found that 8% URLs posted on Twitter pointed to malicious websites (e.g., malware, and scams) which however had a clickthrough rate as low as 0.13% due to the anti-spam measures of Twitter. Besides, Thomas et al.(Thomas etal., 2013) uncovered the trade of fraudulent Twitter accounts in the underground economy and the important role that the accounts had played in spamming activities on Twitter.

Different from these studies, this study focuses on illicit promotions on OSNs (especially Twitter). Also, In contrast to OSN spamming, illicit promotion aims to advertise illicit goods/services rather than phishing or scamming the victims.

Illicit promotion on OSNs.Previous works have also made efforts to reveal illicit promotion on OSNs, but most of them focus on a specific category or entity instead of comprehensive detection and understanding. For example, authors of (Kalyanam and Mackey, 2017; Katsuki etal., 2015) paid attention to illicit pharmacies on Twitter and (Yuan etal., 2019) studied adversarial p*rnography images crawled from popular OSNs along with the underground business behind them. In this paper, we step forward to build up a general tool chain to detect and analyze PIPs along with efficacy demonstrated for diverse PIP categories and multilingual PIP instances. Furthermore, most findings distilled from this study are applicable to the ecosystem of PIPs rather than being limited to a specific PIP category.

Language models. A language model(langmodel, 2023) is a probability distribution over a set of natural language words, and many NLP tasks use DNN-based language models like RNN-based word2vec(Mikolov etal., 2013) and transformer-based BERT(Devlin etal., 2018). Such a DNN-based language model is usually trained on a large corpus of unlabeled text documents through self-supervised training tasks such as masked language modeling, i.e., predicting a missing word in a given sentence. Since the hidden layers of such language models can well represent natural language words in a fixed-size and high-dimensional vector space and are thus commonly used for text encoding, i.e., word embedding. However, training a large and general-purpose language model is both time-consuming and computing-intensive, which motivates the emergence and increasing adoption of the paradigm of pre-training and fine-tuning. In this paradigm, to fulfill a NLP task, rather than building up a DNN model from scratch, a general-purpose and pre-trained language model is first adopted and then fine-tuned on a labeled dataset that is specific to the given NLP task and can be small-scaled. This paradigm outperforms numerous NLP tasks(Devlin etal., 2018; Koroteev, 2021; González-Carvajal and Garrido-Merchán, 2020).

In this study, to facilitate the detection and understanding of PIPs, we have adopted the paradigm of pre-training and fine-tuning in multiple text classification tasks, e.g., binary PIP classification, multiclass PIP classification, and PIP contact recognition, as detailed in §3. Particularly, across these NLP tasks, we adopt the multilingual BERT(mul, 2023) as the pre-trained language model, which is built up through two self-supervised training tasks, namely, masked language modeling and next sentence prediction, upon unlabeled Wikipedia corpus which contain numerous text documents written in 102 natural languages.

3. Methodology

In this section, we present the details of our methodology to capture and analyze posts of illicit promotion (PIPs) on Twitter. As illustrated in Figure2, this methodology is comprised of two key modules. One is the PIP hunter, an automatic pipeline to capture illicit promotion tweets as well as collecting the relevant Twitter accounts, as detailed in §3.1. The other is the PIP analyzer3.2), which is designed to profile PIPs with regards to their categories of illicit services and goods, next-hop contacts, and the underlying campaigns.

Illicit Promotion on Twitter (3)

3.1. The PIP Hunter

To hunt PIPs existing on Twitter, a straightforward method is to inspect every tweet, which however requires unlimited access to the Twitter platform. Instead, our PIP hunter is designed to efficiently search Twitter with keywords that are relevant to PIPs. Therefore, our PIP hunter can not only be used by the Twitter platform for internal inspection, but also serve as an effective tool for any third party to audit illicit promotion on Twitter when there is only a limited access to Twitter data. At a high level, the PIP hunter consists of a cycle of four steps. To start, it searches Twitter with PIP-relevantkeywords, which is followed by a binary PIP classifier that takes amultilingual tweet text as the input and decide whether it is a PIPor not. Given PIPs identified, the third step is to evaluate the qualityof existing PIP keywords and exclude ones of a low PIP hit rate.Then, the last step is to generate keywords from newly captured PIPs and append them to the keyword set so as to boost the nextround of PIP hunting. Below, we present more details step by step.

The tweet crawler. Given PIP-relevant search keywords, a tweet crawler is deployed to query Twitter using its searching APIs222https://developer.twitter.com/en/docs/twitter-api so as to identify tweets and Twitter accounts that are relevant to PIP keywords. Here, the search keywords are either manually crafted in advance or automatically generated from the last round of hunting. Currently, our tweet crawler supports two types of searching keywords: hashtags and Twitter accounts.And the search strategy varies across the keyword types. For hashtag keywords, the standard Twitter search API will be utilized to retrieve tweets relevant to the given hashtag keyword. Then, when the keyword is a Twitter account, the profile of the account will be retrieved along with its latest tweets up to the crawling time. And the number of tweets to retrieve for each account is configurable and should vary across different deployment strategies. In our case, considering the empirical trade-off between collecting more tweets and avoiding repetitive crawling of the same tweet, we set it as 100, i.e., up to 100 latest tweets will be crawled.

Language% DatasetPIP Category% Dataset
English40.65%p*rnography44.47%
Chinese35.48%Illegal Drug11.45%
Japanese8.92%Gambling9.50%
Thai2.64%Money Laundering8.96%
Italian2.59%Data Theft and Leakage8.85%
German2.35%Crowdturfing5.10%
Spanish2.15%Harassment4.08%
Russian1.81%Weapon Sales2.50%
Korean1.75%Forgery and Fake Documents2.28%
French1.65%Surrogacy1.50%
Others1.31%

The binary PIP classifier. Given tweets and Twitter accounts identified through the above searching process, the next step is to distinguish PIPs from benign tweets and benign accounts. As it is observed that the account profile can also be used for illicit promotion, both tweets and account profiles are subject to binary PIP classification.This is achieved through a machine learning classifier which takes either a tweet or the profile of an account as the raw input, and gives a binary output regarding whether the given content is a PIP or not.

To build up this classifier, several options have been extensively explored. The first option is a combination of feature embedding through TF-IDF and classification through classic algorithms including SVM and Random Forest.The second option also considers only the text elements of a post. However, it is built up through fine-tuning a transformer-based multilingual language model (bert-base-multi-lingual-cased(mul, 2023)) which has achieved state-of-the-art performance in many text classification tasks(Devlin etal., 2018).Besides, many posts especially PIPs have media files attached along with the text elements, and these media files (e.g., images or videos) may have visual elements that are important for deciding whether the respective post is a PIP or not. Therefore, the 3rd option we have explored is a multimodal classifier which takes both the visual modality and the text modality into consideration when classifying a post. Here, the text modality is encoded using aforementioned multilingual language model while the visual input is encoded using a pretrained ResNet-152 model(He etal., 2016).

ModelPrecisionRecallF1-Score
Text-Only Transformer97.25%96.04%96.64%
Multi-Modality Transformer96.56%97.43%97.00%
SVM95.29%95.30%95.29%
Random Forest95.08%94.77%94.81%

Labeling the ground truth.To train and evaluate these classification options, a ground truth dataset is collected through an iterative labeling process. Specifically,The labeling process involves two labelers independently annotating samples and resolving conflicts periodically. Also, an iterative process is followed for labeling, involving 1) searching Twitter with PIP-relevant keywords, which gives crawled tweets; 2) labeling a sampled subset of the crawled tweets to update ground-truth; 3) training a weak PIP classifier using the updated ephemeral ground-truth; 4) applying the weak PIP classifier to predict crawled tweets that are unlabeled, identifying both false positive and false negative predictions; and 5) updating ground-truth accordingly. Besides, when a sample is labeled, it is assigned with not only the binary PIP class, but also one of the PIP categories as listed in Table3. This iterative process continues until no new PIP categories emerge, each PIP category is well represented in ground-truth and the PIP classifier has achieved a good performance when evaluated on the crawled tweets. Furthermore, inter-rater agreement rate over 90% is achieved across all labeling tasks. Particularly, for 1,000 samples from ground-truth, the agreement rate is 99.8%.

The final ground truth dataset consists of of 8,408 PIPs and 4,773 non-PIP posts that are diverse and representative in terms of categories and natural languages. As shown in Table1, the entire samples in this groundtruth dataset are composed in 10 natural languages while the positive samples (PIPs) belong to 11 distinct categories of illicit goods and services. One thing to note, when composing non-PIP posts, we consider only PIP candidates that are not true PIPs, rather than using regular tweets. This is based on the observation that such PIP candidates tend to sit closer to the decision boundary than regular tweets and can help train a more robust PIP classifier.Besides, despite having more PIPs than non-PIPs in the ground truth, we don’t observe any negative impact on the binary classifier’s performance, which is well demonstrated by evaluation on wild tweets and crawled tweets. Instead, we observe that fewer PIPs can yield comparable performance. For instance, by removing 4K PIPs for balance, a binary PIP classifier trained on this dataset achieved a precision of 96.6% and a recall of 97.4%. More PIPs, on the other hand, help enhance the ground-truth diversity in categories and languages, and are used to train/evaluate the multi-class PIP classifiers as introduced in §3.2.

Evaluation.Our evaluation on the PIP classifiers is three-fold: 1) 5-fold cross-validation upon ground-truth; 2) Evaluation on crawled tweets;3) Evaluation on wild tweets that are randomly sampled from Twitter Archiving Project (i.e., wild tweets).

Table2 lists the results of the five-fold cross validation. We can see that the text-only transformer-based classifier has achieved a performance that is comparable to that of the multi-modality one, and in the meantime better than that of the classic classifier.Besides, the multi-modality model has the extra cost of downloading and preprocessing of the involved media files, while classic algorithms are inferior to transformer models in terms of data efficiency, multilingual support, automatic feature engineering, and generalizability. We thus choose the text-only transformer model as the default binary PIP classifier.

We further evaluated the selected PIP classifier on unlabeled tweets as collected by our tweet crawler using PIP-relevant keywords. Given prediction results, 500 positive predictions were sampled out for manual validation along with 500 negative predictions. And the manual validation reveals a precision of 94.20% and a recall of 100%. Then, to further evaluate the generalizability of our binary PIP classifier, we applied it to wild tweets, namely, multiple daily tweet snapshots from the Twitter Archiving Project333https://archive.org/details/twitterarchive, which archives the tweet stream on a daily basis at a sampling ratio of 1%.Still, given the prediction results, 500 positive predictions and 500 negative ones were sampled out for manual validation, which reveals a precision of 96.8%, a recall of 99.4% and an accuracy of 98.1%.

Filtering existing PIP keywords.Some existing PIP keywords may not always work well in terms of triggering new PIPs, a filtering step is thus applied to filter out such ineffective keywords. This is achieved through a threshold-based filtering. Specifically, a metric named as RCPkw𝑅𝐶subscript𝑃𝑘𝑤RCP_{kw}italic_R italic_C italic_P start_POSTSUBSCRIPT italic_k italic_w end_POSTSUBSCRIPT is defined as the ratio of new PIPs among all the posts retrieved for a given keyword kw𝑘𝑤kwitalic_k italic_w in the current hunting round, and if a keyword has RCPkw𝑅𝐶subscript𝑃𝑘𝑤RCP_{kw}italic_R italic_C italic_P start_POSTSUBSCRIPT italic_k italic_w end_POSTSUBSCRIPT lower than a configurable threshold, then, it will be added to a blocklist and will not be used in the future rounds. However, if a blocked keyword haven’t been used for 4 or more rounds but gets extracted again by the keyword generator, it will be unblocked and added back to the keyword set. The threshold of RCPkw𝑅𝐶subscript𝑃𝑘𝑤RCP_{kw}italic_R italic_C italic_P start_POSTSUBSCRIPT italic_k italic_w end_POSTSUBSCRIPT has been tuned as 1% in our deployment, which allows us to gain a good tradeoff between the PIP coverage and the hunting efficiency.

Searching keyword generator.The newly captured PIPs will be further fed into the keyword generator so as to extract new keywords which in turn will be appended to the keyword set for the next round of PIP hunting.Through inspecting PIP posts and conducting manual searching experiments, we find out that hashtags extracted from known PIPs can serve as good keywords in terms of discovering new PIPs.Also, when a Twitter account has ever posted a PIP, it either has many historical PIPs or will post PIPs in the short future. Also, the same account may post PIPs of diverse categories.Upon these observations, we compose the search keywords from two sources. One is to extract all the hashtags from identified PIPs, and such seeds are called hashtag keywords. The other is to collect Twitter accounts (users) that either have ever posted PIPs or have their account profiles detected as PIPs, and such seeds are named as account keywords. Many PIP keywords turn out to be effective in terms of discovering previously unknown PIPs, and some of which can even help identify PIPs that belong to different categories, different natural languages, or different campaigns. For instance, searching with the Chinese hashtag keyword ”广州线下” (GuangZhou In-Person) has identified 16,334 distinct PIPs which belong to 5 categories, are composed in 10 different natural languages, and have 107 distinct contacts extracted. Similar examples include the hashtag keywords ogkush and WeedFactory and account keywords sn889 and 426buds.

Deployment.To jump start our PIP hunter, PIP-relevant keywords were first extracted from the thousands of PIPs in the aforementioned groundtruth dataset. We then deployed this PIP hunter for the Twitter platform between November 1, 2022 and April 23, 2023. During the deployment, each round started with searching Twitter with keywords, and ended with new PIPs and new keywords discovered. Then, the resulting new keywords will be fed into the next round along with existing ones.Also, to avoid non-negligible burden to the Twitter servers, our crawler strictly followed the rate limit policies, would suspend the crawling when a rate limit was reached, and would not restart until the rate limit was cleared up. As the hunting process moved forward, the keyword explosion still emerged and conflicted with our limited access to Twitter, e.g., by March 9, 2023, we have got 1,280,113 distinct keywords (405,932 hashtags and 874,175 accounts). Therefore, a random sampling strategy was then applied to limit the workload of our Twitter crawler to around 60K PIP keywords.

In total, we have scanned over 53 millions of tweets, and discovered 12,401,082 PIPs and 580,530 PIP accounts. Besides, despite being constrained by the limited access to other OSN, we have also verified the existence and the concerning prevalence of PIPs on other three OSNs, as detailed in §4.

The applicability to other OSN platforms. We also evaluate whether this PIP hunter is applicable to other OSNs including Facebook, Youtube, and TikTok. However, only manual evaluation was conducted, due to either strict rate limit or the unavailability of programmable searching APIs.As the result, we observe that PIP-relevant hashtags collected from one OSN(e.g., Twitter) turn out to be applicable to other OSNs, which suggests our hashtag-based searching strategy can likely work for all these OSNs. Besides, when predicting posts collected from OSNs other than Twitter, the binary PIP classifier has achieved a performance that is comparable to that for tweets. We thus believe our PIP hunter is capable to fulfill the task of cross-OSN PIP hunting, and we leave it as a future work to further explore cross-OSN illicit promotion.

3.2. The PIP Analyzer

To gain a deep understanding of PIPs and the underlying promotion campaigns, multiple analysis tools have been built up under the hood of the PIP analyzer. These tools include a multi-class classifier to reveal what kinds of illicit goods and services have been promoted in PIPs, a PIP contact extractor to retrieve from PIPs the embedded next hops to communicate with illicit promotion operators, and a PIP cluster to group PIPs into clusters and thus help reveal the campaigns underpinning PIPs.

CategoryDescription
p*rnographySexual services, and sexually explicit content, e.g., indecent images and videos.
GamblingOnline or offline gambling services and products.
Illegal DrugIllegal drugs, e.g., addictive opioid drugs, and prescription drugs.
SurrogacySurrogacy services, e.g., surrogate motherhood agencies.
HarassmentVarious harassment services, e.g., cyberbullying, stalking, call/sms bombers.
Money LaunderingVarious money laundering services, e.g., money muling.
Weapon SalesThe sale of weapons, e.g., P99, a semi-automatic pistol and SR-16, a select-fire rifle.
Data Theft and LeakageServices offering stolen sensitive datasets, or various hacking tools and services.
Forgery and Fake DocumentsServices offering fake or forged documents, e.g., forged passports and fake diplomas.
CrowdturfingServices of illicit crowdsourcing, e.g., deceptive promotion of the popularity of posts or accounts.

The multiclass PIP classifier.Given identified PIPs, a multiclass PIP classifier is designed to group PIPs into one of ten categories of illicit goods and services. And these categories are learned from aforementioned labeling process and are defined according to previous works(Yang etal., 2017b; Du etal., 2016; Yang etal., 2021; Hong etal., 2022; Gomez etal., 2022; Liao etal., 2016b) and Twitter’s policies(twi, 2023c).The full list of these categories is shown in Table3 along with a short description. For each of these categories, we have observed a reasonable volume of samples when labeling PIPs. We also define another category as others to denote PIPs that don’t fit in well for the aforementioned categories. Then, regarding the naming of these categories, we try our best to make them self-explained while keeping aligned with relevant terms used in previous works(Yang etal., 2017a; Yuan etal., 2018; Cao etal., 2012; Lin, [n. d.]; Zhou etal., 2018; West and Bhattacharya, [n. d.]).Particularly, 8 of the 10 categories are considered as illegal in Twitter safety and cybercrime rules(twi, 2023c). Two exceptions are gambling and surrogacy, likely because they vary significantly in legitimacy across jurisdictions. However, we decide to include them as PIP categories considering multiple factors. Particularly, promotion posts of both categories are often correlated with jurisdictions wherein they are illegal. For instance, 68.29% surrogacy posts are in Chinese while surrogacy in China is prohibited. Besides, multiple previous studies on illicit promotion also consider both categories(Yang etal., 2017a; Yuan etal., 2018; Cao etal., 2012; Lin, [n. d.]; Zhou etal., 2018; West and Bhattacharya, [n. d.]).

To build up this classifier,the PIPs in aforementioned ground truth dataset are reused. Similar to the binary PIP classifier, we explored not only text-only but also multimodal classification. For the text-only classification, we fine-tuned bert-base-multilingual-cased(mul, 2023), a multilingual transformer-based language model. And for the multimodal classification, both the text and the visual modality for each PIP have been taken as the input. To build up such a model, ResNet-152 is used to embed the visual input (i.e., an image), and bert-base-multilingual-cased is used to embed the text input. For both models, 80% of the ground truth dataset are used for training, and the remaining 20% are held out for testing. As listed in Table4, both have achieved very good performance across a set of well-acknowledged metrics. And we choose the text-only model as the default multiclass PIP classifier.

ModelPrecisionRecallF1-Score
Text-Only Transformer98.82%98.80%98.80%
Multi-Modality Transformer96.86%97.73%97.27%

The PIP contact extractor. Rather than directly communicating with potential victims on Twitter, PIP operators have embedded various contacts in PIPs as next hops for stealthy communication. These embedded contacts include both website URLs and account IDs of many instant messaging (IM) platforms, which can not only help to gain a better understanding of the underlying campaigns, but also serve as a valuable threat intelligence dataset for future mitigation actions. Thus, a PIP contact extractor is designed to automatically look into a PIP, recognize various contact types, and extract the respective contact entities. Currently, our contact extractor supports the recognition of both websites and accounts of five different IM platforms. These IM platforms include QQ, WeChat, Telegram, Whatsapp, and LINE, which are most frequently embedded in PIPs as observed from our manual study.

Among these contacts, website URLs can be easily identified through regular expression matching. However, many URLs deserve further processing for two factors. On the one hand, some URLs were found to redirect visitors to an IM account, e.g., a Telegram URL https://t.me/{account_id}, and we call such URLs as IM URLs. Such IM URLs will be further processed to extract the respective IM type and IM accounts, which is still achieved by defining and applying regular expressions specific to IM platforms. For instance, WhatsApp’s IM URL pattern is https://wa.me/{phone_number} while it is https://line.me/ti/p/{accountID} for LINE. On the other hand, due to the character limit for each tweet, many URLs were found to have been shortened through popular URL shorteners, e.g., https://bit.ly, and https://tinyurl.com. These shortened URLs will be further visited to recover the true URL of the final landing page. One thing to note, a LINE URL could also be shortened by http://lin.ee/XXXXXXX, and it is http://wa.link/XXXXX for Whatsapp. Then, after the true URL is recovered, the IM contacts will be extracted.

While LINE and Whatsapp accounts can be extracted from IM URLs, instant messaging accounts of other types are typically embedded into PIPs as ID strings. The account extraction for these contact types is abstracted as a named entity recognition (NER) task. In a nutshell, each word of the PIP text is classified into one of the BIO (beginning, inside, and outside) tags that are specific to each contact type. And contact types under consideration include Wechat, Telegram, QQ, and others.To build up this NER classifier, 3,000 PIPs containing contacts were manually labeled with these tags.In total, the resulting ground truth dataset consists of 386 WeChat accounts, 1,254 Telegram accounts, 626 QQ accounts, and 192 other contacts.

Before training and evaluating the NER classifier, a set of preprocessing steps turn out to be necessary. First of all, as aforementioned, IM accounts can be embedded in PIP as IM URLs (e.g., Telegram URLs). Since these IM URLs can share similar semantic contexts with account IDs of the same IM type, they are likely to be recognized as account ids. Therefore, before classification, an URL in a PIP text will be replaced with a string of url-x𝑥xitalic_x wherein x𝑥xitalic_x denotes its position in the URL list of the same PIP. Also, some PIPs embed various emojis to denote the IM platform, e.g., the use of an airplane emoji (U+2708) in PIPs to denote Telegram. Therefore, a further preprocessing step is to replace emoji symbols with respective natural language descriptions, which was achieved through a Python library pyemoji(pye, 2023).

To train this contact classification model, a multilingual language model, XLM-RoBERTa(xlm, 2023), was fine-tuned with 80% of the ground truth dataset. The resulting model has achieved a micro precision of 95.89% and micro recall of 97.92% when testing on the remaining 20% of ground truth. We then applied the contact classifier to all the PIPs, which led to the discovery of 37,621 distinct IM accounts , including 9,644 Telegram accounts, 11,561 WeChat accounts, 12,702 QQ accounts, 225 WhatsApp accounts, and 3,489 LINE accounts. We also manually validated 500 unlabeled PIPs that have IM contacts predicted by the NER model, through which, a precision of 100% and a recall of 86.18% have been observed for this contact extractor.

The PIP cluster analyzer.Upon PIPs along with respective Twitter accounts and embedded contacts, it is interesting to further uncover the illicit promotion campaigns underpinning PIPs. To achieve this, a cluster analyzer is designed to group PIPs that likely belong to the same underlying campaign.We first abstract PIPs as an undirected graph. In this graph, each PIP account is abstracted as a node of the account type and the size of this node is defined to be proportional to the number of PIPs that this account has posted. Similarly, each PIP contact is also defined as a node but of the contact type. Still the size of a contact node is proportional to the number of PIPs which contain this contact. Then, two nodes will be connected through an edge if they share one or more PIPs. For instance, if a contact node and an account node is connected, there are one or more PIPs that are posted by the account node and contain the entity of the contact node. Similarly, if two contact nodes are directly connected, it means they have been embedded together in one or more PIPs. Given this PIP graph, the flood filling strategy is applied to identify subgraphs isolated to each other. And the resulting subgraphs are considered as separate promotion clusters.

3.3. Ethical Considerations

Necessary measures have been taken in our study to avoid any potential ethical issues. Particularly, when crawling Twitter for PIP-relevant posts and accounts, our tweet crawler strictly respects the rate limit of the Twitter platform. Then, the collected datasets are securely stored on our research server to which only our researchers have a limited access. Then, when measuring the datasets, we focus on generating statistical data points. When necessary examples are presented, we anonymize any data fields that may leak personally identifiable information.

4. Posts of Illicit Promotion

In this section, we profile posts of illicit promotion with regards to their scale, categories, products, distribution, evaision techniques, as well as availability across different online social networks (OSNs).

Scale.Leveraging both the PIP hunter and the PIP analyzer, by April 23, 2023, we have captured 12,401,082 PIPs in total on the Twitter platform. These PIPs are posted in 5 major natural languages and originate from 580,530 distinct Twitter accounts.Also, 212,689 distinct contact entities have been extracted from these PIPs, which consist of 164,782 URLs (26,831 FQDNs) , 37,621 accounts of 5 different instant messaging platforms, 3,511 Twitter mentions and 6,775 other contacts. One thing to note, due to our limited access to Twitter data, these results can only serve as a lower-bound indicator when estimating the scale of PIPs on Twitter.

Twitter Stream Daily SnapshotTweets% PIPs
Jan 6, 20234,142,1183.90%
Feb 6, 20234,119,7062.69%
Mar 6, 20234,044,7693.59%
Apr 6, 20233,973,5294.40%
May 6, 20233,988,0354.75%
Jun 6, 20233,612,6043.10%

The ratios of PIPs to the Twitter stream.Another interesting question is to profile what fraction of general tweets are PIPs.As described in §3.1, we utilized daily tweet snapshots from the Twitter Archiving Project to evaluate the generalizability of our binary PIP classifier, which achieves a precision of99.33%. We then selected the daily tweet snapshot of the 6th day of each month between January 2023 and June 2023, and applied our binary PIP classifier to these 6 snapshots. As presented in Table5, the ratio of PIPs ranges from 2.69% to 4.75%, which, as a concerning fraction, highlights the prevalence of PIPs as well as the necessity of more mitigation efforts.

Categories.Leveraging the multiclass PIP classifier (§3.2), all the captured PIPs have been predicted into one of well-defined 11 categories of illicit goods and services. The distribution of PIPs across these categories is listed in Table6. And we can see the top 3 categories are p*rnography,Gambling and Illegal Drug, which account for 91.17% PIPs in sum. Belong, we provide more observations regarding the categories of PIPs.

Category% PIPsCategory% PIPs
p*rnography69.78%Weapon Sales0.58%
Gambling13.34%Forgery and Fake Documents0.43%
Illegal Drug8.04%Others0.32%
Money Laundering4.09%Crowdturfing0.20%
Data Theft and Leakage2.03%Harassment0.18%
Surrogacy1.16%

Among these categories, p*rnography is the most prevalent one, with 69.78% PIPs belonging to this category. Particularly, we found out that many PIPs were used to promote products or services of child p*rnography, which have severely violated Twitter’s content policy regarding child sexual exploitation(chi, 2023).Also, to evade detection, many p*rnography PIPs contain only innocent or seemingly harmless text. Instead, the images attached to these tweets contain necessary visual elements that both hint child p*rnography as well as providing explicit instructions regarding how to access child p*rnography.For instance, as shown in Figure3(a), the tweet text does not reveal itself as p*rnography, nor does it specify how to access the resources for child p*rnography, but the attached image states that the user needs to go to the author’s homepage to learn the telegram group(Figure 3(b)).

Illicit Promotion on Twitter (4)
Illicit Promotion on Twitter (5)

Additionally, geographical names are often used in PIPs so as to advertise location-based illegal services, e.g., local sex services. For example, the following two tweets from two different accounts, “Candice Bridges Elton CopperField #Guangzhou #Guangzhou Massage” and “Lesley Browne Baldwin Yerkes #Guangzhou #Guangzhou Massage”, both use massage and wellness as a way to promote illegal sex services in Guangzhou, China. Also, both differ only in the text but share the same images, hashtags, geographical names, and contacts.

Besides, 8.04% PIPs involve the promotion of various drugs, e.g., methamphetamine, marijuana, and heroin. As illustrated in Figure 4, these tweets promote drugs in Chinese and Thai, respectively. Also, these PIPs have utilized jargon words. For example, ”猪肉”, a Chinese word denoting pork, is used in illicit promotion to name methamphetamine. It is indeed surprising to see them get posted successfully on Twitter since even the tweet text itself appears sufficient to decide it is drug-related, not to mention the hashtags as well as the images. Besides, weapon PIPs often specify the detailed model of the weapons on sale. And some even include images to demonstrate the item’s availability. Similarly, harassment PIPs are mainly used to promote harassment as a service, such as SMS bombing, and phone call bombing. Lastly, PIPs in the category of data theft&leakage encompass many products, such as hacking services, stolen account credentials, unauthorized access to confidential information, and personal data leakage. Concrete PIP examples for these categories can be found in AppendixA.

Illicit Promotion on Twitter (6)
Illicit Promotion on Twitter (7)
Illicit Promotion on Twitter (8)
Illicit Promotion on Twitter (9)

PIP distribution across Twitter accounts.Given Twitter accounts with one or more PIPs posted, we analyzed their distribution across the PIPs they have posted. We firstly measured the ratio of PIPs over all the tweets an account has posted. Figure 5(a) plots the cumulative distribution of Twitter accounts over their PIP ratio, and we can see it is a highly skewed distribution, with 61.96% PIP accounts have 20% or fewer tweets being PIPs. However, we indeed observed 2,081 PIP accounts whose PIP tweet ratio is higher than 90%. Besides, we also measured PIP accounts over the absolute number of PIPs each of them has posted, which is presented in Figure5(b). And we can see over 93.30% of accounts posted less than 10 PIPs, while 95.96% accounts have less than 20 PIPs.Furthermore, we ordered the PIP accounts descendingly by the number of PIPs, and the top 1K accounts (0.17%) account for 32.68% PIPs, while it is 68.12% PIPs for the top 10K accounts (1.7% of all PIP accounts). By now, we can conclude that most PIP accounts post a low volume of PIPs while top accounts contribute a large portion of PIPs.

We further investigate the accounts that have all posts detected as PIPs, which comprise 7.87% of all PIP accounts. Our analysis reveals that such accounts tend to be dedicated to promoting a specific category of PIPs. Also, some accounts have posted very few tweets since registration, but rely primarily on their profiles rather than tweets for illicit promotion. Also, among top accounts with the most PIPs, we observe that a Twitter account can post hundreds of thousands of PIPs without getting banned by the platform. For instance, a PIP account registered in February, 2020 has posted over 198k PIPs in the category of data leakage, which strongly indicates the limitations of Twitter’s content moderation.

PIP engagements.Over time, PIPs accumulate engagements such as likes, replies, retweets and quotes (i.e. retweets with comments). Given the date a PIP is published tpsubscript𝑡𝑝t_{p}italic_t start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and the date it is crawled tcsubscript𝑡𝑐t_{c}italic_t start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, te=tctpsubscript𝑡𝑒subscript𝑡𝑐subscript𝑡𝑝t_{e}=t_{c}-t_{p}italic_t start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is the elapse time of a PIP, i.e., the days passed since it is posted. As described in §5, 90% PIPs survive the first two months, thus we group PIPs based on elapse time ranging from 1 day to 60 days, and calculate the average engagement of each group. As shown in Figure6, PIPs indeed receive a non-negligible volume of engagement, e.g., PIPs can receive 374 likes on average after being posted for 30 days, while the average number of likes a regular tweet can receive is 37(ave, 2021). What is also observed is that the extent of engagement declines gradually after 35 days. We believe it is because PIPs with more engagement are more likely to be detected and unpublished by Twitter, which renders the survivorship bias wherein only the surviving cases will be counted for groups of long elapse time. However we are not clear about the reason why average retweets show significant fluctuations after 40 days.

Illicit Promotion on Twitter (10)

PIP distribution across natural languages.To recognize the natural language of a PIP, fastText444https://fasttext.cc/, a library for text representation and classification from Facebook’s AI Research (FAIR) lab, is applied.As the result, most PIPs (over 99%) are grouped into 5 major natural languages including Chinese, English, Japanese, Thai and Spanish.In terms of illicit categories, the top 5 languages are similar to each other as shown in Table 7. p*rnography takes a major place in almost all of the top languages, followed by gambling or illegal drug. Considering products of these categories are illegal but can be of great demand in most countries, it’s reasonable to have observed such results.

Languages% PIPTop3 Categories
Chinese48.75%p*rnographyGamblingIllegal Drug
English38.24%p*rnographyGamblingIllegal Drug
Japanese9.98%p*rnographyGamblingIllegal Drug
Thai1.70%Illegal DrugSurrogacyp*rnography
Spanish0.54%Illegal Drugp*rnographyGambling

Finding I: Millions of illicit promotion posts have been observed on the Twitter platform, which are widely distributed in terms of Twitter accounts, categories and products of illicit goods and services, and natural languages.

Evasion tactics of PIPs.To comprehensively analyze the evasion tactics of PIPs, we randomly sample 500 PIPs for each illicit category, manually look into each of them to distll tactics, and implement quantitative analysis on the whole PIP dataset to verify the applicability.The tactics adopted by miscreants to evade PIP detection can be summarized into four categories, as detailed below.

Hashtags. On one hand, benign and popular hashtags are extensively abused by miscreants to masquerade their PIPs.Across all tweets we’ve collected, non-PIPs have a median number of 0 hashtags and an average number of 2.27 hashtags, while it is 5 and 6.98 for PIPs respectively, which is much larger than Twitter’s official recommendation of one or two relevant hashtags(twi, 2023a).Specifically, 73.43% PIPs have embedded three or more hashtags while 49.47% have five or more.Besides, PIP operators may exploit hashtags of trending topics or popular events to enhance their tweets’ reach. For instance, during the FIFA World Cup 2022, illegal gambling operators were found to inject into their PIPs with football hashtags, #WorldCup, #WorldCup2022, or #WorldCupBetting.Similarly, operators of surrogacy PIPs were found to have embedded hashtags closely relevant to the Mid-Autumn Festival, a popular Chinese festival celebrating family reunion and togetherness.Such hashtags include a Chinese hashtag meaning the Mid Autumn andone more meaning family reunion. In addition to increasing visibility, the injection of benign and popular hashtags into PIPs may likely mislead Twitter’s content moderation to some extent. On the other hand, miscreants compose various malicious hashtags so as to keep the main text of a PIP benign while promoting illicit goods and services in the hashtags, e.g., a Chinese hashtag denoting domestic surrogacy, and a Thai hashtag denoting Cannabis Bangkok, and one more Chinese hashtag denoting mobile phone eavesdropping.

Jargon words. Across PIPs of different categories, jargon words are commonly used.Particularly, through looking into PIPs, we have manually identified a set of 108 different jargon words that are embedded into 31.21% of all PIPs, which can only serve as a lower-bound estimate for the adoption of jargon words in PIPs.Also, the adoption of jargon words is observed for all the illicit categories.For example, in Chinese PIPs of drugs, ”叶子” (leaves in English), or its emoji is used to refer to marijuana, while ”猪肉” (pork) and ”冰”(ice) and their emojis are used to represent methamphetamine. These terms are derived from either the color or the shape of the respective illegal items.Additionally, a metaphor of the nature of the activity is also popular as jargon words for PIPs. For instance, the farmer and its emoji are used in drug PIPs to refer to people who plant marijuana. The use of jargon words helps illicit promotion operators blend their content with benign tweets, thus impeding content moderation. More jargon examples can be found in AppendixB.

Embedding illicit promotion messages into any component of a PIP but not its main text. Another evasion pattern we have observed is to embed the promotional text elements into any locations but not the body text of a PIP. Such kinds of locations include attached media files, usernames of the PIP account,description of the PIP account, hashtags, or even poll options.

Finding II: Illicit promotion campaigns have adopted various evasion tactics, likely in an attempt to evade content moderation of the Twitter platform.

CategoryTiktokYoutubeFacebook
p*rnography5/203/2016/20
Gambling16/2014/2019/20
Illegal Drug7/201/2019/20
Surrogacy8/205/2015/20
Harassment6/2010/2017/20
Money Laundering14/209/2017/20
Weapon Sales5/205/206/20
Data Theft and Leakage16/204/2019/20
Forgery and Fake Documents9/204/2018/20
Crowdturfing19/2017/2020/20
Hit Ratio52%35%83%

Availability across online social networks To investigate the availability of PIPs across social network platforms, we conducted a cross-platform analysis for other three major platforms beyond Twitter: YouTube, Facebook, and TikTok. For each PIP category, we randomly selected 20 PIP keywords which led to PIPs of the respective category identified on Twitter, and manually searched these platforms for PIPs if any.As shown in Table 8, PIPs are present on all these three platforms, albeit with varying degrees of prevalence. Particularly, Facebook has the highest hit ratio, with 83% of the sampled search keywords having at least one PIP located, and Youtube with the lowest at 35%. Besides, the availability of PIPs varies across platforms and categories. For example, the hit ratios of both gambling and data theft and leakage are higher on Tiktok than other subcategories. On the other hand, the promotion of weapon sales has a low hit ratio on all three platforms, with less than 30% of the sampled seeds having one or more PIPs observed. Interestingly, we found that some PIPs on different platforms share the same contact, indicating that they belong to the same promotion campaign.

Finding III: Illicit promotion has also been observed at a concerning prevalence for other three popular OSNs including Facebook, Youtube, and TikTok.

5. Content Moderation Against Illicit Promotion

All OSNs under our study claim to have enforced strict content moderation(tik, 2023; twi, 2023b; you, 2023), in which case, violative posts should either be blocked from publishing or get unpublished once detected in a later time. However, The existence of so many violative PIPs of diverse illicit goods and services categories suggest respective OSNs especially Twitter fail to prevent PIPs from being posted and becoming visible to OSN users. In this section, we present more in-depth observations regarding Twitter’s content moderation towards PIPs and PIP accounts.

GroupTweeting PeriodEvasion Rates
RV-11RV-2RV-3RV-4
PIP-1Oct 24-30, 2221.69%21.59%21.50%21.46%
PIP-2Oct 31-Nov 6, 2222.33%22.25%22.12%22.08%
PIP-3Dec 5-11, 2257.78%56.41%56.02%55.87%
PIP-4Dec 12-18, 2276.18%73.08%72.93%71.26%
PIP-5Jan 16-22, 2398.27%97.62%97.62%97.62%
PIP-6Jan 23-29, 2398.16%97.90%97.90%97.11%
PIP-7Feb 27-Mar 5, 2395.04%94.49%93.92%93.30%
PIP-8Mar 6-12, 2394.74%94.11%93.52%93.04%
  • 1

    Revisiting RV-1 was conducted on April 10-13, 2023, while it is April 15-19, 2023 for RV-2, April 22-26, 2023 for RV-3, and April 29-May 1, 2023 for RV-4.

To further profile the effectiveness of Twitter’s content moderation towards published PIPs, we carried out revisiting for captured PIPs. Specifically, 50,000 PIPs hunted in each round are sampled, and their availability are tested by revisiting them periodically (usually at a weekly pace). Given sampled PIPdatep𝑃𝐼subscript𝑃𝑑𝑎𝑡subscript𝑒𝑝PIP_{date_{p}}italic_P italic_I italic_P start_POSTSUBSCRIPT italic_d italic_a italic_t italic_e start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT which are first posted on date datep𝑑𝑎𝑡subscript𝑒𝑝date_{p}italic_d italic_a italic_t italic_e start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, their evasion rate ERdatep,dater𝐸subscript𝑅𝑑𝑎𝑡subscript𝑒𝑝,𝑑𝑎𝑡subscript𝑒𝑟ER_{date_{p}\text{, }date_{r}}italic_E italic_R start_POSTSUBSCRIPT italic_d italic_a italic_t italic_e start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , italic_d italic_a italic_t italic_e start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT is defined as the ratio of PIPs that are still reachable when being revisited on date dater𝑑𝑎𝑡subscript𝑒𝑟date_{r}italic_d italic_a italic_t italic_e start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT.Table9 presents the revisiting results between April 10 and May 1 in 2023, for PIPs posted between Oct 24, 2022 and March 12, 2023. And we can seePIPs posted between Oct 24-30, 2022 have 21.46% still reachable when revisiting 6 month later in the end of April 2023. Also, more than 90% PIPs can survive the first two months since being published.

We then investigate why some PIPs become unavailable during revisits and find out that most are due to account suspension. Specifically, Twitter returns one of six error messages if the PIP is unavailable. These messages can tell us the underlying reasons of PIP unavailability. For instance, a PIP may be unpublished due to the suspension of the parental account, in which case, the error message will be ”This Tweet is from a suspended account.”. Other reasons include page non-existence, deletion by the author, account non-existence and violation of Twitter’s rules.Among PIPs unreachable during revisits, 91.59% are due to account suspension, and 6.22% are due to page non-existence.In summary, we can see that Twitter works in a continuous manner to detect and suspend PIP accounts, resulting in tweets (including PIPs) of the detected PIP accounts also become unpublished.

However, many PIPs can still survive for a long period, and we further investigate the difference between unpublished PIPs and surviving ones. In an attempt to answer this question, PIPs are sampled and divided into two groups depending on the length of their evasion time as observed during revisiting. We then compare both groups with regards to various aspects, e.g., PIP categories, the text syntactic and semantics, the writing style, characteristics of the posting accounts. The only significant difference we have observed resides in the number of PIPs posted by the parental Twitter accounts. For the group of PIPs with a short lifetime, their parental Twitter accounts have 379 PIPs observed on average in our dataset. On the contrary, it is only 62 for the group of PIPs with much longer evasion time. A reasonable explanation is that the more PIPs a Twitter account has posted, the more likely it will be captured and thus suspended by the platform, in which case, all its tweets will also be unavailable, including the PIPs. To further profile this observation, we sampled 153,921 out of all the observed 580,530 PIP accounts, and revisited them during April 15-19, 2023. By then, only 78.95% accounts were still available while all the others got suspended. Comparing alive PIP accounts with suspended ones reveals that alive PIP accounts have 3 PIPs observed on average while blocked ones have 87.Finding IV: An arms race is observed between illicit promotion campaigns and Twitter content moderation, as the result of which, almost 80% PIPs have got banned six months later after being published while on the other hand, 90% PIPs can survive the first two months.

6. Contacts and Operators of Illicit Promotion

Given PIPs extensively profiled, we then move the spotlight to the extracted PIP contacts as well as the underlying promotion operators (campaigns).

6.1. PIP Contacts

Scale. Utilizing our PIP contact extractor, we have successfully extracted a total of 212,689 unique contacts across the Twitter platform from all the PIPs and PIP account profiles. These contacts comprise 12,702 QQ accounts, 11,561 WeChat accounts, 9,644 Telegram accounts, 3,489 LINE accounts,225 WhatsApp accounts, 3,511 Twitter accounts, 164,782 URLs (corresponding to 26,831 Fully Qualified Domain Names (FQDNs)) and 6,775 other accounts.

Besides, among these contacts, 23.98% are exclusively identified from account profile while 73.92% are only from PIPs and the remaining 2.10% are observed from both account profiles and PIPs.On the other hand, 28.02% PIP accounts have one or more such contacts embedded in either their profiles or their PIPs.

In summary, while promoting illicit goods and services at a large scale on Twitter, the underlying operators prefer platforms other than OSNs for further interaction with their customers, especially instant messaging services. This highlights the necessity and importance of cross-platform collaboration in terms of fighting against illicit promotion activities.

Distribution. We have also measured the distribution of contacts across PIPs and PIP accounts, and a major observation is that each contact tends to be promoted through many PIPs and across multiple PIP accounts. Specifically, 5.09% contacts have been embedded into 5 or more PIPs while it is 10 or more for 2.86% contacts.

Finding V: When it comes to communication with prospective customers, operators of illicit promotion prefer instant messaging platforms and self-managed websites rather than OSN platforms.

Evasion techniques for promoting PIP contacts. As discussed in §4, PIP operators have adopted various evasion or promotion tactics so as to make the resulting PIPs appear benign. Furthermore, various evasion techniques have also been observed to hide a contact deeply in the respective PIP. Equipped with such evasion tactics, it would become challenging for the OSN platform to extract the contacts from a detected PIP. Such contact-wise evasion techniques can be grouped into 3 categories, as detailed below.

Linguistic obfuscation. This tactic involves using code words, emojis, abbreviations, or unconventional formatting to obscure contact information.From the extracted contacts in our dataset, we randomly sampled 200 tweets to further investigate the extent of linguistic obfuscation used by illicit operators. A striking 59% of samples employed various linguistic obfuscation techniques. For instance, ”QQ” is often replaced with ”企鹅”, ”扣扣”, or the emoji Penguin( U+1F427), while ”Wechat” is replaced with ”薇”, ”V”, ”VX”, the heavy black heart (U+2764), or the satellite emoji (U+1F6F0). Similarly, ”Telegram” is often substituted with ”airplane” or the airplane symbol (U+2708), and ”Line” with ”赖”. These substitutions are often based on similar pronunciations (e.g., ”Line” in Chinese sounds like ”赖”, U+1F6F0 and U+2764 are similar to ”WeChat” in Chinese) or resemblances to the application’s icons (e.g., the Telegram icon is an airplane(U+2708), and the QQ icon is a penguin (U+1F427)).

Visual representation and steganography. The visual representation of contacts involves presenting contact information as visual elements in images or even videos. This can be further strengthened by the use of human-written text (Figure4(a)), which makes it difficult for even OCR systems to recognize and extract the respective contact information. Embedding contact information within seemingly innocuous elements is also a way of stealthy promotion, e.g., embedding the contact in the poll options instead of the tweet main text. Additionally, illicit operators may also embed contact information within any frames of innocuous videos. Currently, our PIP contact extractor supports only the retrieval of contacts from text inputs, namely, the PIP text and the profile text of a PIP account. We leave it as our future work to extract contacts from media files and other seemingly innocuous components of a PIP.

Private messaging. By requesting potential buyers to send private messages for more information (e.g., Figure8(a) in AppendixA), illicit operators can avoid the direct exposure of their contact information in OSNs. This tactic enables them to selectively respond to potential buyers and increase the effectiveness of their promotional efforts.

Contact preference across PIP categories.Regarding contact selection, illicit promotion campaigns of different categories exhibit distinct preferences. Table10 presents, for each PIP category, how the extracted contact entities distribute across well-known contact types.p*rn, money laundering, and weapon sales predominantly prefer Telegram accounts, which are associated with 43.60%, 68.17%, 73.36% of PIPs of the corresponding category, respectively. For harassment activities, QQ emerges as the most favored contact method, accounting for 56.23% of harassment PIPs. Besides, for data theft & leakage and fake documents, WeChat is much more popular than others.

Category1TelegramWechatQQLINEWA2
p*rnography43.6%30.24%34.02%2.49%0.02%
Illegal Drug36.99%24.91%13.79%14.73%1.12%
Gambling42.45%41.39%24.73%2.96%0.97%
Surrogacy22.92%67.76%14.26%0.03%1.38%
Harassment25.60%22.78%56.23%0.01%0.00%
Money Laundering68.17%19.77%17.91%0.04%0.24%
Weapon Sales73.36%18.80%7.86%0.1%2.5%
Data Theft and Leakage35.75%58.91%11.51%0.02%1.50%
FFD216.64%68.76%14.81%0.47%0.00%
Crowdturfing55.23%44.18%11.04%0.00%0.00%
  • 1

    Note that one PIP may include more than one contact.

  • 2

    WA denotes WhatsApp and FFD is short for Forgery and Fake Documents.

The adoption of novel IM platforms. In addition to the widely used communication platforms, we have also observed the adoption of newly emerged end-to-end encrypted communication platforms, such as Wickr(wic, 2023), BatChat(bat, 2023), and Potato Chat(pot, 2023). All these platforms support end-to-end encrypted communication, just like Telegram and WhatsApp. Furthermore, we find that these secure communication platforms support one or more novel security features which can facilitate more stealthy communication compared with aforementioned well-adopted ones.

Specifically, BatChat provides secret chat mode, in which, both parties’ avatars are encoded, making it difficult to identify the participants. Additionally, the platform disables the ability to take screenshots or forward chat messages, further safeguarding the content of the conversation.Besides, the secret chat mode of Potato Chat supports even more security features. For instance, Potato enforces message deletion by instructing the recipient’s app to delete messages when the sender removes them. Furthermore, users can set self-destruction timers for messages, photos, videos, and files, automatically removing the content from both devices after a specified time.

Finding VI: Illicit promotion campaigns are increasingly adopting emerging end-to-end encrypted communication platforms, e.g., Wickr, BatChat, and Potato Chat.

Threat alerts from VirusTotal. To profile the threats of websites embedded in PIPs, we retrieved and analyzed their threat reports from VirusTotal(vir, 2023), a widely recognized open threat exchange platform.Due to the rate limit of VT, we only randomly sampled 20k PIP website URLs (corresponding to 7,391 FQDNs) for analysis. As shown in Table11, 0.80% URLs have one or more alarms triggered, while it is 2.46% for FQDNs. Furthermore, among these alarmed URLs, 11.38% are considered as malware websites, while 24.39% are considered as phishing websites.

CategoryCountReportedAlarmedMalwarePhishing
URLs20,0007.88%0.80%0.07%0.15%
FQDNs7,39169.30%2.46 %0.32%0.70%

6.2. Cybercrime Operators

Through applying the clustering technique introduced in §3.2 to all PIPs captured on Twitter, many PIP clusters have been uncovered, and each cluster consists of PIP tweets, their parental Twitter accounts (PIP authors), as well as contacts embedded in these PIPs. We then conducted manual study for sampled clusters, which confirms that most clusters appear to be separate PIP campaigns. Next, we detail the observations as distilled from analyzing these PIP campaigns.

Scale of illicit promotion operators. In total , we have observed 474,979 distinct PIP campaigns. Among these campaign, 93.50% campaigns turn out to be singleton groups and each graph contains only an author node (i.e., a PIP account) and the tweets published by this account. These singleton groups constitute 73.9% PIPs and 76.5% PIP accounts.As for the the remaining 6.5% of the campaigns, 85% are campaigns with only one contact, 8.61% have two contacts, and 3.85% have three or more contacts. With respect to distribution across categories, 87.54% campaigns promote a single PIP category, 9.91% campaigns promote two categories and 2.55% campaigns promote three or more categories.

Representative 4 campaigns of illicit promotion. we looked further into the campaigns and discovered some commonly used schemes in cybercrime promotion. In general, the most notable characteristic of the campaigns is that most operators take control of at least two accounts to accomplish promotion and these accounts are connected by common contacts or websites. We present four campaigns as below under the name of Cluster-1 to Cluster-4.

Cluster-1. Figure7(a) visualizes cluster-1 as a graph wherein nodes denote either PIP accounts or contacts, and the size of each node represents the number of PIPs it is associated with. As shown in Figure7(a), cluster-1 consists of 3 nodes (two account nodes and one WeChat contact), 3 edges and 6,328 PIPs, and all PIPs are classified as data-leakage.The two Twitter accounts have published similar amounts of PIPs and promoted the same WeChat contact, and are very likely to be operated by the same person or organization.

Illicit Promotion on Twitter (11)
Illicit Promotion on Twitter (12)
Illicit Promotion on Twitter (13)
Illicit Promotion on Twitter (14)

Cluster-2. As shown in Figure7(b), cluster-2 consists of 290 nodes and 349 edges, containing 412 PIPs all classified as p*rnography. Different from Cluster-1, Cluster-2 is composed of much more small-scale accounts connected to four fully qualified domain names, which are voice-live.liblo.jp, live-video.golog.jp, live-video.liblo.jp, and video-liv-e.liblo.jp. By looking up DNS resolutions for the FQDNs, we discovered that they share the same server IP (147.92.146.242) of a p*rn website. Therefore, it’s reasonable to infer that Cluster-2 is operated in an organized way by the same operator of the website.

Cluster-3. As shown in Figure7(c), Cluster-3 consists 106 nodes, 125 edges and 516 PIPs, 97.10% of which are classified as fake document. It’s worth noting that a subset of author-type nodes are connected by red edges and form a fully connected component in the middle of Figure7(c), which means that these accounts have common contacts embedded in their user profiles. As mentioned before, embedding contacts in components other than main text of the tweet is a common evasion tactic utilized by miscreants.

Cluster-4. As shown in Figure7(d), Cluster-4 has 708 nodes and 887 edges, associated with 702 PIPs. 99.02% of PIPs are classified into the category of data-leakage. Nodes of Cluster-4 consists 163 authors, 29 QQ contacts, 21 Telegram contacts and 495 Wechat contacts. Unlike Cluster-1 to Cluster-3 which mainly use a single kind of contact, diversity in contact types but still high concentration on single category of illicit goods and services makes cluster-4 a special one.

7. Discussion

Recommendations for real-world PIP mitigation. We recommend OSNs should invest more efforts into detecting and removing PIPs from their platforms, particularly for PIPs not in English. Also, PIP detection should not limit to scrutinizing the text elements, but take into consideration all elements of an OSN post and its posting account. Also considering PIPs are operated as campaigns, clustering-based detection will be promising and efficient if a low fake alarm rate can be achieved.Also, as demonstrated in our evaluation, the series of tools developed in this study can benefit future endeavors to mitigate illicit promotion in online social networks.Besides, many PIPs prefer IM channels to interact with potential customers while promoting on OSNs, emphasizing the importance of cross-platform collaboration, especially between OSNs and IM platforms.

Responsible disclosure.We’ve been trying to report PIPs and PIP accounts to Twitter, Telegram and other related IM platforms. Up to this writing, LINE responded, assuring us that they’re investigating but are unable to provide the results of the investigation. Tencent Security has confirmed and is fixing the issues on QQ and Wechat. For Twitter and Telegram, we reported through both web forms and emails, but have yet to receive any concrete response.

Code and data release. To facilitate reproduction of our machine learning models (e.g., the binary PIP classifier and the NER-based contact extractor), the source code for training and testing will be released along with the respective ground truth datasets.Also, to avoid misuse by miscreants, the whole dataset of PIPs and thier contacts will be provided upon request and background checking.

8. Conclusion

Through this study, we have qualified and quantified the prevalence of diverse posts of illicit promotions (PIPs) on Twitter. Also, it is observed that illicit operators have adopted various evasion tactics when composing and distributing PIPs, which partially explains why so many PIPs could circumvent the content moderation of Twitter, get posted, and keep alive for months before being unpublished. What is also observed is that accounts of instant messaging platforms, especially end-to-end encrypted ones, are frequently used as the next hops for the communication between PIP victims and the underlying illicit operators. Such a cross-platform operation patternalso highlights the importance of security collaborations among OSN and instant messaging platforms for the mitigation of illicit promotion activities.

References

  • (1)
  • ave (2021)2021.The behaviors and attitudes of U.S. Adults on Twitter.https://www.pewresearch.org/wp-content/uploads/sites/20/2021/11/PDL_11.15.21_Twitter_users_final_report.pdf.
  • wic (2023)2023.AWS Wickr.https://wickr.com/.
  • bat (2023)2023.BatChat.https://www.batchat.com.
  • mul (2023)2023.bert-base-multilingual-cased.https://huggingface.co/bert-base-multilingual-cased.
  • tik (2023)2023.Community Guidelines of TikTok.https://www.tiktok.com/community-guidelines/en/.
  • twi (2023a)2023a.The dos and don’ts of hashtags.https://business.twitter.com/en/blog/the-dos-and-donts-of-hashtags.html.
  • pot (2023)2023.Potato Chat.https://potato.im/.
  • pye (2023)2023.pyemoji.https://pypi.org/project/pyemoji/.
  • chi (2023)2023.Twitter Child sexual exploitation policy.https://help.twitter.com/en/rules-and-policies/sexual-exploitation-policy.
  • twi (2023b)2023b.The Twitter Rules.https://help.twitter.com/en/rules-and-policies/twitter-rules.
  • twi (2023c)2023c.Twitter Rules and Policies.https://help.twitter.com/en/rules-and-policies#safety-and-cybercrime.
  • vir (2023)2023.VirusTotal.https://www.virustotal.com/.
  • xlm (2023)2023.xlm-roberta-base.https://huggingface.co/xlm-roberta-base.
  • you (2023)2023.YouTube’s Community Guidelines.https://support.google.com/youtube/answer/9288567.
  • Aldridge and Askew (2017)Judith Aldridge and Rebecca Askew. 2017.Delivery dilemmas: How drug cryptomarket users identify and seek to reduce their risk of detection by law enforcement.International Journal of Drug Policy 41 (2017), 101–109.
  • Benevenuto etal. (2010)Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. 2010.Detecting spammers on twitter. In Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), Vol.6. 12.
  • Caballero etal. (2011)Juan Caballero, Chris Grier, Christian Kreibich, and Vern Paxson. 2011.Measuring pay-per-install: the commoditization of malware distribution.. In Usenix security symposium, Vol.13. 1–13.
  • Cao etal. (2012)Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012.Aiding the Detection of Fake Accounts in Large Scale Social Online Services. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). USENIX Association, San Jose, CA, 197–210.https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/cao
  • Christin (2013)Nicolas Christin. 2013.Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace. In Proceedings of the 22nd international conference on World Wide Web. 213–224.
  • Devlin etal. (2018)Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018.Bert: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805 (2018).
  • Du etal. (2016)Kun Du, Hao Yang, Zhou Li, Haixin Duan, and Kehuan Zhang. 2016.The Ever-Changing Labyrinth: A Large-Scale Analysis of Wildcard DNS Powered Blackhat SEO. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 245–262.https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/du
  • Du etal. (2018)Po-Yi Du, Ning Zhang, Mohammedreza Ebrahimi, Sagar Samtani, Ben Lazarine, Nolan Arnold, Rachael Dunn, Sandeep Suntwal, Guadalupe Angeles, Robert Schweitzer, etal. 2018.Identifying, collecting, and presenting hacker community data: Forums, IRC, carding shops, and DNMs. In 2018 IEEE international conference on intelligence and security informatics (ISI). IEEE, 70–75.
  • Gao etal. (2012)Hongyu Gao, Yan Chen, Kathy Lee, Diana Palsetia, and AlokN Choudhary. 2012.Towards online spam filtering in social networks.. In NDSS, Vol.12. 1–16.
  • Gao etal. (2021)Yuhao Gao, Haoyu Wang, Li Li, Xiapu Luo, Guoai Xu, and Xuanzhe Liu. 2021.Demystifying illegal mobile gambling apps. In Proceedings of the Web Conference 2021. 1447–1458.
  • Gomez etal. (2022)Gibran Gomez, Pedro Moreno-Sanchez, and Juan Caballero. 2022.Watch Your Back: Identifying Cybercrime Financial Relationships in Bitcoin through Back-and-Forth Exploration. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (Los Angeles, CA, USA) (CCS ’22). Association for Computing Machinery, New York, NY, USA, 1291–1305.https://doi.org/10.1145/3548606.3560587
  • González-Carvajal and Garrido-Merchán (2020)Santiago González-Carvajal and EduardoC Garrido-Merchán. 2020.Comparing BERT against traditional machine learning text classification.arXiv preprint arXiv:2005.13012 (2020).
  • Grier etal. (2010)Chris Grier, Kurt Thomas, Vern Paxson, and Michael Zhang. 2010.@ spam: the underground on 140 characters or less. In Proceedings of the 17th ACM conference on Computer and communications security. 27–37.
  • He etal. (2016)Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016.Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  • Hong etal. (2022)Geng Hong, Zhemin Yang, Sen Yang, Xiaojing Liao, Xiaolin Du, Min Yang, and Haixin Duan. 2022.Analyzing Ground-Truth Data of Mobile Gambling Scams.2022 IEEE Symposium on Security and Privacy (SP) (2022), 2176–2193.https://api.semanticscholar.org/CorpusID:251143836
  • John etal. (2011)JohnP John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy, and Martin Abadi. 2011.deSEO: Combating Search-Result Poisoning.. In USENIX security symposium. 1–15.
  • Kalyanam and Mackey (2017)Janani Kalyanam and Timothy Mackey. 2017.Detection and Characterization of Illegal Marketing and Promotion of Prescription Drugs on Twitter.CoRR abs/1712.00507 (2017).arXiv:1712.00507http://arxiv.org/abs/1712.00507
  • Kanich etal. (2011)Chris Kanich, Nicholas Weaver, Damon McCoy, Tristan Halvorson, Christian Kreibich, Kirill Levchenko, Vern Paxson, GeoffreyM Voelker, and Stefan Savage. 2011.Show Me the Money: Characterizing Spam-advertised Revenue.. In USENIX Security Symposium, Vol.35.
  • Katsuki etal. (2015)Takeo Katsuki, TimKen Mackey, and Raphael Cuomo. 2015.Establishing a Link Between Prescription Drug Abuse and Illicit Online Pharmacies: Analysis of Twitter Data.J Med Internet Res 17, 12 (16 Dec 2015), e280.https://doi.org/10.2196/jmir.5144
  • Koroteev (2021)MV Koroteev. 2021.BERT: a review of applications in natural language processing and understanding.arXiv preprint arXiv:2103.11943 (2021).
  • Kotzias etal. (2016)Platon Kotzias, Leyla Bilge, and Juan Caballero. 2016.Measuring PUP Prevalence and PUP Distribution through Pay-Per-Install Services.. In USENIX Security Symposium. 739–756.
  • Kotzias etal. (2021)Platon Kotzias, Juan Caballero, and Leyla Bilge. 2021.How did that get in my phone? unwanted app distribution on android devices. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 53–69.
  • langmodel (2023)langmodel 2023.The language model.https://en.wikipedia.org/wiki/Language_model.
  • Leontiadis etal. (2011)Nektarios Leontiadis, Tyler Moore, and Nicolas Christin. 2011.Measuring and Analyzing Search-Redirection Attacks in the Illicit Online Prescription Drug Trade.. In USENIX Security Symposium, Vol.11.
  • Leontiadis etal. (2014)Nektarios Leontiadis, Tyler Moore, and Nicolas Christin. 2014.A nearly four-year longitudinal study of search-engine poisoning. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. 930–941.
  • Li etal. (2021)Zhengyi Li, Xiangyu Du, Xiaojing Liao, Xiaoqian Jiang, and Tiffany Champagne-Langabeer. 2021.Demystifying the dark web opioid trade: content analysis on anonymous market listings and forum posts.Journal of Medical Internet Research 23, 2 (2021), e24486.
  • Liao etal. (2016a)Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhongyu Pei, Hao Yang, Jianjun Chen, Haixin Duan, Kun Du, Eihal Alowaisheq, Sumayah Alrwais, etal. 2016a.Seeking nonsense, looking for trouble: Efficient promotional-infection detection through semantic inconsistency search. In 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 707–723.
  • Liao etal. (2016b)Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhongyu Pei, Hao Yang, Jianjun Chen, Haixin Duan, Kun Du, Eihal Alowaisheq, Sumayah Alrwais, Luyi Xing, and Raheem Beyah. 2016b.Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search. In 2016 IEEE Symposium on Security and Privacy (SP). 707–723.https://doi.org/10.1109/SP.2016.48
  • Lin ([n. d.])Yi-ChiehJessica Lin. [n. d.].Fake Stuff: China and the Rise of Counterfeit Goods.Routledge.
  • McCoy etal. (2012)Damon McCoy, Andreas Pitsillidis, Grant Jordan, Nicholas Weaver, Christian Kreibich, Brian Krebs, GeoffreyM Voelker, Stefan Savage, and Kirill Levchenko. 2012.Pharmaleaks: Understanding the business of online pharmaceutical affiliate programs. In Proceedings of the 21st USENIX conference on Security symposium. 1–1.
  • Mikolov etal. (2013)Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013.Efficient estimation of word representations in vector space.arXiv preprint arXiv:1301.3781 (2013).
  • Miller etal. (2014)Zachary Miller, Brian Dickinson, William Deitrick, Wei Hu, and AlexHai Wang. 2014.Twitter spammer detection using data stream clustering.Information Sciences 260 (2014), 64–73.
  • Soska and Christin (2015)Kyle Soska and Nicolas Christin. 2015.Measuring the longitudinal evolution of the online anonymous marketplace ecosystem. In 24th {{\{{USENIX}}\}} security symposium ({{\{{USENIX}}\}} security 15). 33–48.
  • Thomas etal. (2016)Kurt Thomas, Juan AElices Crespo, Ryan Rasti, JeanMichel Picod, Cait Phillips, Marc-André Decoste, Chris Sharp, Fabio Tirelo, Ali Tofigh, Marc-Antoine Courteau, etal. 2016.Investigating Commercial Pay-Per-Install and the Distribution of Unwanted Software.. In USENIX Security Symposium. 721–739.
  • Thomas etal. (2011a)Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, and Dawn Song. 2011a.Design and evaluation of a real-time url spam filtering service. In 2011 IEEE symposium on security and privacy. IEEE, 447–462.
  • Thomas etal. (2011b)Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011b.Suspended accounts in retrospect: an analysis of twitter spam. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. 243–258.
  • Thomas etal. (2013)Kurt Thomas, Damon McCoy, Chris Grier, Alek Kolcz, and Vern Paxson. 2013.Trafficking Fraudulent Accounts: The Role of the Underground Market in Twitter Spam and Abuse.. In USENIX Security Symposium. 195–210.
  • West and Bhattacharya ([n. d.])Jarrod West and Maumita Bhattacharya. [n. d.].Intelligent financial fraud detection: A comprehensive review.57 ([n. d.]), 47–66.https://doi.org/10.1016/j.cose.2015.09.005
  • Xu etal. (2016)Hailu Xu, Weiqing Sun, and Ahmad Javaid. 2016.Efficient spam detection across online social networks. In 2016 IEEE International Conference on Big Data Analysis (ICBDA). IEEE, 1–6.
  • Yang etal. (2013)Chao Yang, Robert Harkreader, and Guofei Gu. 2013.Empirical evaluation and new design for fighting evolving twitter spammers.IEEE Transactions on Information Forensics and Security 8, 8 (2013), 1280–1293.
  • Yang etal. (2019)Hao Yang, Kun Du, Yubao Zhang, Shuang Hao, Zhou Li, Mingxuan Liu, Haining Wang, Haixin Duan, Yazhou Shi, Xiaodong Su, etal. 2019.Casino royale: a deep exploration of illegal online gambling. In Proceedings of the 35th Annual Computer Security Applications Conference. 500–513.
  • Yang etal. (2017a)Hao Yang, Xiulin Ma, Kun Du, Zhou Li, Haixin Duan, Xiaodong Su, Guang Liu, Zhifeng Geng, and Jianping Wu. 2017a.How to learn klingon without a dictionary: Detection and measurement of black keywords used by the underground economy. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 751–769.
  • Yang etal. (2017b)Hao Yang, Xiulin Ma, Kun Du, Zhou Li, Haixin Duan, Xiaodong Su, Guang Liu, Zhifeng Geng, and Jianping Wu. 2017b.How to Learn Klingon without a Dictionary: Detection and Measurement of Black Keywords Used by the Underground Economy. In 2017 IEEE Symposium on Security and Privacy (SP). 751–769.https://doi.org/10.1109/SP.2017.11
  • Yang etal. (2021)Ronghai Yang, Xianbo Wang, Cheng Chi, Dawei Wang, Jiawei He, Siming Pang, and WingCheong Lau. 2021.Scalable Detection of Promotional Website Defacements in Black Hat SEO Campaigns. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 3703–3720.https://www.usenix.org/conference/usenixsecurity21/presentation/yang-ronghai
  • Yuan etal. (2018)Kan Yuan, Haoran Lu, Xiaojing Liao, and XiaoFeng Wang. 2018.Reading Thieves’ Cant: Automatically Identifying and Understanding Dark Jargons from Cybercrime Marketplaces.. In USENIX Security Symposium. 1027–1041.
  • Yuan etal. (2019)Kan Yuan, Di Tang, Xiaojing Liao, XiaoFeng Wang, Xuan Feng, Yi Chen, Menghan Sun, Haoran Lu, and Kehuan Zhang. 2019.Stealthy p*rn: Understanding Real-World Adversarial Images for Illicit Online Promotion. In 2019 IEEE Symposium on Security and Privacy (SP). 952–966.https://doi.org/10.1109/SP.2019.00032
  • Zhao etal. (2016)Kangzhi Zhao, Yong Zhang, Chunxiao Xing, Weifeng Li, and Hsinchun Chen. 2016.Chinese underground market jargon analysis based on unsupervised learning. In 2016 IEEE Conference on Intelligence and Security Informatics (ISI). IEEE, 97–102.
  • Zheng etal. (2019)Panpan Zheng, Shuhan Yuan, Xintao Wu, Jun Li, and Aidong Lu. 2019.One-class adversarial nets for fraud detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol.33. 1286–1293.
  • Zhou etal. (2018)Yadong Zhou, Ximi Wang, Junjie Zhang, Peng Zhang, Lili Liu, Huan Jin, and Hongbo Jin. 2018.Analyzing and Detecting Money-Laundering Accounts in Online Social Networks.IEEE Network 32, 3 (2018), 115–121.https://doi.org/10.1109/MNET.2017.1700213
  • Zhu etal. (2021)Wanzheng Zhu, Hongyu Gong, Rohan Bansal, Zachary Weinberg, Nicolas Christin, Giulia Fanti, and Suma Bhat. 2021.Self-supervised euphemism detection and identification for content moderation. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 229–246.

Appendix A Extra PIP Examples of Various Categories

Figure8 presents typical examples of illicit promotion posts on Twitter.

Illicit Promotion on Twitter (15)
Illicit Promotion on Twitter (16)
Illicit Promotion on Twitter (17)
Illicit Promotion on Twitter (18)
Illicit Promotion on Twitter (19)
Illicit Promotion on Twitter (20)

Appendix B The List of Jargon Words as Observed from PIPs

The list of examples of jargon words are listed in Table12 along with descriptions.

WordEnglishCategoryDescription
蓝精灵The SmurfsDrugEcstasy
燃料FuelDrugThe alias of marijuana.
叶子LeavesDrugMarijuana
飞叶子Flying LeavesDrugSmoke marijuana
飞行FlyDrugThe Behavior of smoking marijuana.
机长PilotDrugDrug dealer selling marijuana.
农夫FarmerDrugMan who grows marijuana.
鲍鱼Abalonep*rnographyFemale genitalia
遛鸟Walking the birdsp*rnographyMen relaxing their genitals in public or outdoors
呦呦Yoyop*rnographyUsed to describe child p*rn.
四件套Four pieces setData LeakageSell the ’four-piece set’ of bank cards (mobile phone card, bank card, U-disk, and photocopy of ID card).
跑分BenchmarkingMoney LaunderingUsing a third-party payment account to collect funds on behalf of others, and then transferring the funds to earn a commission.
渔夫FishermanMoney LaunderingPhishing using own personal number, responsible for finding victims
船长CaptainMoney LaunderingResponsible for setting up phishing sites.
黄河Yellow RiverWeaponA brand of gun
海螺ConchGamblingBoard and card game and electronic game system.
Illicit Promotion on Twitter (2024)

References

Top Articles
Latest Posts
Article information

Author: Rubie Ullrich

Last Updated:

Views: 5693

Rating: 4.1 / 5 (72 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Rubie Ullrich

Birthday: 1998-02-02

Address: 743 Stoltenberg Center, Genovevaville, NJ 59925-3119

Phone: +2202978377583

Job: Administration Engineer

Hobby: Surfing, Sailing, Listening to music, Web surfing, Kitesurfing, Geocaching, Backpacking

Introduction: My name is Rubie Ullrich, I am a enthusiastic, perfect, tender, vivacious, talented, famous, delightful person who loves writing and wants to share my knowledge and understanding with you.