You are on page 1of 9

Influence and Passivity in Social Media

Daniel M. Romero Wojciech Galuba


Cornell University EPFL
Center for Applied Distributed Information
Mathematics Systems Lab
Ithaca, NY, USA Lausanne, Switzerland
dmr239@cornell.edu wojciech.galuba@epfl.ch

Sitaram Asur Bernardo A. Huberman


Social Computing Lab Social Computing Lab
HP Labs HP Labs
Palo Alto, California, USA Palo Alto, California, USA
sitaram.asur@hp.com bernardo.huberman@hp.com

ABSTRACT ing companies and celebrities eager to exploit this vast new
The ever-increasing amount of information flowing through medium. As a result, ideas, opinions, and products compete
Social Media forces the members of these networks to com- with all other content for the scarce attention of the user
pete for attention and influence by relying on other people community. In spite of the seemingly chaotic fashion with
to spread their message. A large study of information propa- which all these interactions take place, certain topics man-
gation within Twitter reveals that the majority of users act age to get an inordinate amount of attention, thus bubbling
as passive information consumers and do not forward the to the top in terms of popularity and contributing to new
content to the network. Therefore, in order for individuals trends and to the public agenda of the community. How
to become influential they must not only obtain attention this happens in a world where crowdsourcing dominates is
and thus be popular, but also overcome user passivity. We still an unresolved problem, but there is considerable consen-
propose an algorithm that determines the influence and pas- sus on the fact that two aspects of information transmission
sivity of users based on their information forwarding activ- seem to be important in determining which content receives
ity. An evaluation performed with a 2.5 million user dataset inordinate amounts of attention.
shows that our influence measure is a good predictor of URL One is the popularity and status of given members of these
clicks, outperforming several other measures that do not ex- social networks, which is measured by the level of atten-
plicitly take user passivity into account. We also explicitly tion they receive in the form of followers who create links
demonstrate that high popularity does not necessarily imply to their accounts to automatically receive the content they
high influence and vice-versa. generate. The other is the influence that these individuals
wield, which is determined by the actual propagation of their
content through the network. This influence is determined
1. INTRODUCTION by many factors, such as the novelty and resonance of their
The explosive growth of Social Media has provided mil- messages with those of their followers and the quality and
lions of people the opportunity to create and share content frequency of the content they generate. Equally important
on a scale barely imaginable a few years ago. Massive partic- is the passivity of members of the network which provides a
ipation in these social networks is reflected in the countless barrier to propagation that is often hard to overcome. Thus
number of opinions, news and product reviews that are con- gaining knowledge of the identity of influential and least pas-
stantly posted and discussed in social sites such as Facebook, sive people in a network can be extremely useful from the
Digg and Twitter, to name a few. Given this widespread perspectives of viral marketing, propagating one’s point of
generation and consumption of content, it is natural to tar- view, as well as setting which topics dominate the public
get one’s messages to highly connected people who will prop- agenda.
agate them further in the social network. This is particularly In this paper, we analyze the propagation of web links on
the case in Twitter, which is one of the fastest growing so- twitter over time to understand how attention to given users
cial networks on the Internet, and thus the focus of advertis- and their influence is determined. We devise a general model
for influence using the concept of passivity in a social net-
work and develop an efficient algorithm similar to the HITS
algorithm [14] to quantify the influence of all the users in the
network. Our influence measure utilizes both the structural
Permission to make digital or hard copies of all or part of this work for properties of the network as well as the diffusion behavior
personal or classroom use is granted without fee provided that copies are among users. The influence of a user thus depends on not
not made or distributed for profit or commercial advantage and that copies only the size of the influenced audience, but also on their
bear this notice and the full citation on the first page. To copy otherwise, to passivity. This differentiates it from earlier measures of in-
republish, to post on servers or to redistribute to lists, requires prior specific fluence which were primarily based on individual statistical
permission and/or a fee.
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.
properties such as the number of followers or retweets [7].
We have shown through our extensive evaluation that
this influence model outperforms other measures of influ-
ence such as PageRank, H-index, the number of followers 3.1 Background on Twitter
and the number of retweets. In addition it has good predic- Twitter is an extremely popular online microblogging ser-
tive properties in that it can forecast in advance the upper vice, that has gained a very large user base, consisting of
bound on the number of clicks a URL can get. We have more than 105 million users (as of April 2010). The Twitter
also presented case studies showing the top influential users graph is a directed social network, where each user chooses
uncovered by our algorithm. An important conclusion from to follow certain other users. Each user submits periodic
the results is that the correlation between popularity and status updates, known as tweets, that consist of short mes-
influence is quite weak, with the most influential users not sages limited in size to 140 characters. These updates typi-
necessarily the ones with the highest popularity. Addition- cally consist of personal information about the users, news
ally, when we considered nodes with high passivity, we found or links to content such as images, video and articles. The
a majority of them to be spammers and robot users. This posts made by a user are automatically displayed on the
demonstrates an application of our algorithm in automatic user’s profile page, as well as shown to his followers.
user categorization and filtering of online content . A retweet is a post originally made by one user that is for-
warded by another user. Retweets are useful for propagating
2. RELATED WORK interesting posts and links through the Twitter community.
The study of information and influence propagation in Twitter has attracted lots of attention from corporations
social networks has been particularly active for a number for the immense potential it provides for viral marketing.
of years in fields as disparate as sociology, communication, Due to its huge reach, Twitter is increasingly used by news
marketing, political science and physics. Earlier work fo- organizations to disseminate news updates, which are then
cused on the effects that scale-free networks and the affinity filtered and commented on by the Twitter community. A
of their members for certain topics had on the propagation number of businesses and organizations are using Twitter
of information [6]. Others discussed the presence of key or similar micro-blogging services to advertise products and
influentials [12, 11, 8, 5, 10] in a social network, defined as disseminate information to stockholders.
those who are responsible for the overall information dis- 3.2 Dataset
semination in the network. This research highlighted the
value of highly connected individuals as key elements in the Twitter provides a Search API for extracting tweets con-
propagation of information through the network. taining particular keywords. To obtain the dataset for this
Huberman et al. [2] studied the social interactions on study, we continuously queried the Twitter Search API for
Twitter to reveal that the driving process for usage is a a period of 300 hours starting on 10 Sep 2009 for all tweets
sparse hidden network underlying the friends and follow- containing the string http. This allowed us to acquire a
ers, while most of the links represent meaningless interac- continuous stream of 22 million tweets with URLs, which
tions. Jansen et al. [3] have examined twitter as a mecha- we estimated to be 1/15th of the entire Twitter activity at
nism for word-of-mouth advertising. They considered par- that time. From each of the accumulated tweets, we ex-
ticular brands and products and examined the structure of tracted the URL mentions. Each of the unique 15 million
the postings and the change in sentiments. Galuba et al. [4] URLs in the data set was then checked for valid formatting
propose a propagation model that predicts, which users will and the URLs shortened via the services such as bit.ly or
tweet about which URL based on the history of past user tinyurl.com were expanded into their original form by fol-
activity. lowing the HTTP redirects. For each encountered unique
There have also been earlier studies focused on social in- user ID, we queried the Twitter API for metadata about
fluence and propagation. Agarwal et al. [8] have examined that user and in particular the user’s followers and followees.
the problem of identifying influential bloggers in the blogo- The end result was a dataset of timestamped URL mentions
sphere. They discovered that the most influential bloggers together with the complete social graph for the users con-
were not necessarily the most active. Aral et al [9] have cerned.
distinguished the effects of homophily from influence as mo- User graph. The user graph contains those users whose
tivators for propagation. As to the study of influence within tweets appeared in the stream, i.e., users that during the
twitter, Cha et al. [7] have performed a comparison of three 300 hour observation period posted at least one public tweet
different measures of influence - indegree, retweets and user containing a URL. The graph does not contain any users who
mentions. They discovered that while retweets and men- do not mention any URLs in their tweets or users that have
tions correlated well with each other, the indegree of users chosen to make their Twitter stream private.
did not correlate well with the other two measures. Based For each newly encountered user ID, the list of followed
on this, they hypothesized that the number of followers may users was only fetched once. Our data set does not capture
not a good measure of influence. On the other hand, Weng the changes occurring in the user graph over the observation
et al [5] have proposed a topic-sensitive PageRank measure period.
for influence in Twitter. Their measure is based on the fact
that they observed high reciprocity among follower relation- 4. THE IP ALGORITHM
ships in their dataset, which they attributed to homophily. Evidence for passivity. The users that receive informa-
However, other work [7] has shown that the reciprocity is tion from other users may never see it or choose to ignore
low overall in Twitter and contradicted the assumptions of it. We have quantified the degree to which this occurs on
this work. Twitter (Fig. 4). An average Twitter user retweets only one
in 318 URLs, which is a relatively low value. The retweeting
3. TWITTER rates vary widely across the users and the small number of
Algorithm 1: The Influence-Passivity (IP) algorithm
106
audience retweeting rate I0 ← (1, 1, . . . , 1) ∈ R|N | ;
105 user retweeting rate P0 ← (1, 1, . . . , 1) ∈ R|N | ;
for i = 1 to m do
104
Update Ri using operation (2) and the values Ii−1 ;
#users

103 Update Ii using operation (1) and the values Ri ;


for j = 1 to |N | do
102 Ij
Ij = X ;
101 Ik
k∈N
100 -8 Rj
10 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 Pj = X ;
retweeting rate Rk
k∈N
end
end
Figure 1: Evidence for the Twitter user passivity.
Return (Im , Pm );
We measure passivity by two metrics: 1. the user
retweeting rate and 2. the audience retweeting rate.
The user retweeting rate is the ratio between the
number of URLs that user i decides to retweet to Operation. The algorithm iteratively computes both the
the total number of URLs user i received from the passivity and influence scores simultaneously in a similar
followed users. The audience retweeting rate is the manner as in the HITS algorithm for finding authoritative
ratio between the number of user i’s URLs that were web pages and hubs that link to them [14].
retweeted by i’s followers to the number of times a Given a weighted directed graph G = (N, E, W ) with
follower of i received a URL from i. nodes N , arcs E, and arc weights W , where the weights
wij on arc e = (i, j) represent the ratio of influence that i
the most active users play an important role in spreading the exerts on j to the total influence that i attempted to exert
information in Twitter. This suggests that the level of user on j, the IP algorithm outputs a function I : N → [0, 1],
passivity should be taken into account for the information which represents the nodes’ relative influence on the net-
spread models to be accurate. work, and a function P : N → [0, 1] which represents the
Assumptions. Twitter is used by many people as a tool nodes’ relative passivity of the network.
for spreading their ideas, knowledge, or opinions to others. For every arc e = (i, j) ∈ E, we define the acceptance rate
wi,j
An interesting and important question is whether it is pos- by uij = X . This value represents the amount
sible to identify those users who are very good at spreading wkj
their content, not only to those who choose to follow them, k:(k,j)∈E

but to a larger part of the network. It is often fairly easy of influence that user j accepted from user i normalized
to obtain information about the pairwise influence relation- by the total influence accepted by j from all users on the
ships between users. In Twitter, for example, one can mea- network. The acceptance rate can be viewed as the dedi-
sure how much influence user A has on user B by counting cation or loyalty user j has to user i. On the other hand,
the number of times B retweeted A. However, it is not very for every e = (j, i) ∈ E we define the rejection rate by
clear how to use the pairwise influence information to accu- 1 − wji
vji = X . Since the value 1 − wji is amount
rately obtain information about the relative influence each (1 − wjk )
user has on the whole network. To answer this question we k:(j,k)∈E
design an algorithm (IP) that assigns a relative influence of influence that user i rejected from j, then the value vji
score and a passivity score to every user. The passivity of represents the influence that user i rejected from user j nor-
a user is a measure of how difficult it is for other users to malized by the total influence rejected from j by all users in
influence him. We assume that the influence of a user de- the network.
pends on both the quantity and the quality of the audience The algorithm is based on the following operations:
she influences. In general, our model makes the following X
assumptions: Ii ← uij Pj (1)
j:(i,j)∈E
1. A user’s influence score depends on the number of peo- X
ple she influences as well as their passivity. Pi ← vji Ij (2)
j:(j,i)∈E
2. A user’s influence score depends on how dedicated the
people she influences are. Dedication is measured by Each term on the right hand side of the above operations
the amount of attention a user pays to a given one as corresponds to one of the listed assumptions. In operation
compared to everyone else. 1 the term Pj corresponds to assumption 1 and the term
3. A user’s passivity score depends on the influence of uij corresponds to assumption 2. In operation 2 the term Ij
those who she’s exposed to but not influenced by. corresponds to assumption 3 and the term vji corresponds
to assumption 4. The Influence-Passivity algorithm (Algo-
4. A user’s passivity score depends on how much she re- rithm 1) takes the graph G as the input and computes the
jects other user’s influence compared to everyone else. influence and passivity for each node in m iterations.
Generating the input graph. There are many ways of
defining the influence graph G = (N, E, W ). We choose to 101
construct it by taking into account retweets and the follower
graph in the following way: The nodes are users who tweeted 100

change in values
at least 3 URLs. The arc (i, j) exists if user j retweeted a
URL posted by user i at least once. The arc e = (i, j) has
weight we = Qij
S
where Qi is the number of URLs that i 10-1
ij
mentioned and Sij is the number of URLs mentioned by i
and retweeted by j. 10-2

5. EVALUATION 10-3 0 10 20 30 40 50
iteration
5.1 Computations
Based on the obtained dataset (§3.2) we generate the
weighted graph using the method described in §4. The
graph consists of approximately 450k nodes and 1 million Figure 2: IP-algorithm convergence. In each itera-
arcs with mean weight of 0.07, and we use it to compute tion we measure the sum of all the absolute changes
the PageRank, influence and passivity values for each node. of the computed influence and passivity values since
The Influence-Passivity algorithm (Algorithm §1) converges the previous iteration
to the final values in tens of iterations (Fig. 2).
PageRank. The PageRank algorithm has been widely
used to rank web pages as well as people based on their au- URL traffic correlation. Using the URL click data,
thority and their influence [13, 5]. In order to compare it we take several different user attributes and test how well
with the results from the IP algorithm, we compute PageR- they can predict the attention the URLs posted by the users
ank on the weighted graph G = (N, E, W ) with a small receive (Fig. 3). It is important to note that none of the
change. First, since the arcs e = (i, j) ∈ E indicate that influence measures are capable of predicting the exact num-
user i exerts some influence on user j then we invert all the ber of clicks. The main reason for this is that the amount of
arcs before running PageRank on the graph while leaving attention a URL gets is not only a function of the influence
the weights intact. In other words, we generate a new graph of the users mentioning it, but also of many other factors
G0 = (N 0 , E 0 , W 0 ) where N 0 = N , E 0 = {(i, j) : (j, i) ∈ E}, including the virality of the URL itself and more impor-
and for each (i, j) ∈ E 0 we define wij 0
= wji . This generates tantly, whether the URL was mentioned anywhere outside
a new graph G0 analogous to G but where the influenced of Twitter, which is likely to be the biggest source of unpre-
users point to their influencers. Second, since the graph dictability in the click data. The click data that we collected
G0 is weighted we assume that when the the random surfer represents the total clicks on the URLs.
of the PageRank algorithm is currently at the node i, she The wide range of factors potentially affecting the Bit.ly
0
wij
chooses to visit node j next with probability X . clicks may prevent us from predicting their number accu-
0
wik rately. However, the upper bound on that number can to
k:(i,k)∈E 0 a large degree be predicted. To eliminate the outlier cases,
The Hirsch Index. The Hirsch index (or H-index) is we examined how the 99.9th percentile of the clicks varied
used in the scientific community in order to measure the as the measure of influence increased.
productivity and impact of a scientist. A scientist has index Number of followers. The most readily available and
h if he has published h articles which have been cited at least often used by the Twitterers measure of influence is the num-
h times each. It has been shown that the H-index is a good ber of followers a user has. As the Figure 3(a) shows, the
indicator of whether a scientist has had high achievements number of followers of an average poster of a given URL is a
such as getting the Nobel prize [16]. Analogously, in Twitter, relatively weak predictor of the maximum number of clicks
a user has index h if h of his URL posts have been retweeted that the URL can receive.
at least h times each. Number of retweets. When users post URLs their
posts might be retweeted by other users. Each retweet ex-
5.2 Influence as a correlate of attention plicitly credits the original poster of the URL (or the user
Any measure of influence is necessarily a subjective one. from whom the retweeting user heard about the URL). The
However, in this case, a good measure of influence should number of times a user has been credited in a retweet has
have a high predictive power on how well the URLs men- been assumed to be a good measure of influence [7]. How-
tioned by the influential users attract attention and propa- ever, Figure 3(b) shows that the number of times a user has
gate in the social network. We would expect the URLs that been retweeted in the past is a poor predictor of the max-
highly influential users propagate to attract a lot of atten- imum number of clicks the URLs posted by that user can
tion and user clicks. Thus, a viable estimator of attention is get.
the number of times a URL has been accessed. PageRank. Figure 3(c) shows that the average PageR-
Click data. Bit.ly is a URL shortening service that for ank of those who tweet a certain URL does not correlate
each shortened URL keeps track of how many times it has well with the number of clicks the URL will get. One of the
been accessed. For the 3.2M Bit.ly URLs found in the tweets main differences between the IP algorithm and PageRank is
we have queried the Bit.ly API for the number of clicks the that the IP algorithm takes into account the passivity of the
service has registered on that URL. people a user influences and PageRank does not.IP-influence
107 107
10000
106 106
10000
105 1000 105
# bit.ly clicks

# bit.ly clicks
104 104 1000

#urls

#urls
103 100 103 100
102 102
10 10
101 101
100 −1 1 100 −3 1
10 100 101 102 103 104 105 106 10 10−2 10−1 100 101 102 103 104
average # of followers of the posting users average # times posting users were retweeted

(a) Average number of followers vs. number of clicks (b) Average number of times users were retweeted vs.
on URLs number of clicks on URLs

107
1e+05 107 1e+05
106 106
1e4 1e4
105 105
# bit.ly clicks

# bit.ly clicks
104 1000 104 1000
#urls

#urls
103 103 100
100
102 102
10 10
101 101
100 −7 1 100 0 1
10 10−6 10−5 10−4 10−3 10−2 10 101 102
average PageRank of the posting users average H-index of the posting users

(c) Average user PageRank vs. number of clicks on (d) Average user H-index vs. number of clicks on
URLs URLs

107 10000 107 10000


106 106
105 1000 105 1000
# bit.ly clicks

# bit.ly clicks

104 104
#urls

#urls
100 100
103 103
102 102
10 10
101 101
100 −9 100 −9
10 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10 10−8 10−7 10−6 10−5 10−4 10−3
average IP-influence of the posting users average IP-influence of the posting users

(e) Average user IP-influence vs. number of clicks on (f) Average user IP-influence vs. number of clicks on
URLs, using the retweet graph as input URLs, using the co-mention graph as input

Figure 3: We consider several user attributes: the number of followers, the number of times a user has been
retweeted, the user’s PageRank, H-index and IP-influence. For each of the 3.2M Bit.ly URLs we compute
the average value of a user’s attribute among all the users that mentioned that URL. This value becomes the
x coordinate of the URL-point; the y coordinate is the number of clicks on the Bit.ly URL. The density of
the URL-points is then plotted for each of the four user attributes. The solid line in each figure represents
the 99.9th percentile of Bit.ly clicks at a given attribute value. The dotted line is the linear regression fit for
the solid line.
10−2 10−3
1000
1000

IP-influence (retweet graph)


10−3 10−4
−4
10
10−5
100
IP-influence

10−5 100

#users

#users
10−6
10−6
10 10−7 10
10−7
−8
10−8 10

10−9 0 10−9 −9
10 101 102 103 104 105 106 10 10−8 10−7 10−6 10−5 10−4 10−3 10−2
# of followers IP-influence (co-mention graph)

Figure 4: For each user we place a user-point with Figure 5: The correlation between the IP-influence
IP-influence as the y coordinate and the x coordinate values computed based on two inputs: the co-
set to the number of user’s followers. The density mention influence graph and the retweet influence
of user-points is represented in grayscale. graph.

is a much better indicator of URL popularity than PageR-


6. IP ALGORITHM ADAPTABILITY
ank. This suggests that influencing users who are difficult to As mentioned earlier (§4) there are many ways of defining
influence, as opposed to simply influencing many users, has a social graph in which the edges indicate pairwise influence.
a positive impact on the eventual popularity of the message We have so far been using the graph based on which user
that a user tweets. retweeted which user (retweet influence graph). However,
The Hirsch Index. Figure 3(d) shows that despite the that explicitly signals of influence such as retweets are not
fact that in the scientific community the H-index is used as a always available. One way of overcoming this obstacle is
good predictor of high achievements, in Twitter it does not to use other, possibly weaker, signals of influence. In the
correlate well with URL popularity. This may reflect the fact case of Twitter, we can define an influence graph based on
that attention in the scientific community plays a symmetric mentions of URLs without regard of actual retweeting in the
role, since those who pay attention to the work of others following way.
also seek it from the same community. Thus, citations play The co-mention graph. The nodes of the co-mention
a strategic role in the successful publishing of papers, since influence graph are users who tweeted at least three URLs.
the expectation of authors is that referees and authors will The edge (i, j) exists if user j follows user i and j mentioned
demand attention to their work and those of their colleagues. at least one URL that i had previously mentioned. The edge
Sij
Within Social Media such symmetry does not exist and thus e = (i, j) has weight we = Fij +S ij
where Fij is the number
the decision to forward a message to the network lacks this of URLs that i mentioned and j never did and S is the
particularly strategic value. number of URLs mentioned by j and previously mentioned
IP-Influence score. As we can see in Figure 3(e), the by i.
average IP-influence of those who tweeted a certain URL can The resulting graph has the disadvantage that the edges
determine the maximum number of clicks that a URL will are based on a much less explicit notion of influence than
get. Since the URL clicks are never considered by the IP al- when based on retweets. Therefore the graph could have
gorithm to compute the user’s influence, the fact that we find edges between users who do not influence each other. On
a very clear connection between average IP-influence and the the other hand, the retweeting conventions on Twitter are
eventual popularity of the URLs (measured by clicks) serves not uniform and therefore sometimes users who repost a
as an unbiased evaluation of the algorithm and exposes the URL do not necessarily credit the correct source of the URL
power of IP-influence. For example, as we can see in Fig- with a retweet [15]. Hence, the influence graph based on
ure 3(e), given a group of users having very large average retweets has potentially missing edges.
IP-influence scores who post a URL we can estimate, with Since the IP algorithm has the flexibility of allowing any
99.9% certainty, that this URL will not receive more than influence graph as input, we can compute the influence scores
100, 000 clicks. On the other hand, if a group of users with of the users based on the co-mention influence graph and
very low average IP-influence score post the same URL we compare with the results obtained from the retweet influence
can estimate, with 99.9% certainty that the URL will not graph. As we can see in Figure 3(f), we find that the retweet
receive more than 100 clicks. graph yields influence scores that are better at predicting the
Furthermore, figure 4 shows that a user’s IP-influence is maximum number of clicks a URL will obtain than the co-
not well correlated with the number of followers she has. mention influence graph. Nevertheless, Figures 3(f), 3(c),
This reveals interesting implications about the relationship 3(a), and 3(b) show that the influence values obtained from
between a person’s popularity and the influence she has on the co-mention influence graph are still better at predicting
other people. In particular, it shows that having many fol- URL traffic than other measures such as PageRank, number
lowers on Twitter does not imply power to influence them of followers, H-index or the total number of times a user
to even click on a URL. has been retweeted. Furthermore, Figure 5 shows that the
mashable Social Media Blogger redscarebot Keyword Aggregator
jokoanwar Film Director drunk bot Suspended
google Google News tea robot Keyword Aggregator
aplusk Actor condos Listing Aggregator
syfy Science Fiction Channel wootboot Suspended
smashingmag Online Developer Magazine raybeckerman Attorney
michellemalkin Conservative Commentator hashphotography Keyword Aggregator
theonion News Satire Organization charlieandsandy Suspended
rww Tech/Social Media Blogger ms defy Suspended
breakingnews News Aggregator rpattinsonbot Keyword Aggregator

Table 1: Users with the most IP-influence (with at Table 2: Users with the most IP-passivity
least 10 URLs posted in the period)

also able identify users with very low number of followers


influence score based on both graphs do not correlate well, but high influence. Table 4 shows the users with the most
which suggests that considering explicit vs. implicit signals influence who rank less that 100, 000 in number of follow-
of influence can drastically change the outcome of the IP ers. We find that during the data collection period some
algorithm. In general, we find that the explicitness of the of the users in this category ran very successful retweeting
signal provided by the retweets yields slightly better results contests where users who retweeted their URLs would have
when it comes to predicting URL traffic, however, the influ- the chance of winning a prize. Moreover, there is a group
ence scores based on co-mentions may surface a different set of users who post from Twitdraw.com, a website where peo-
of potentially influential users. ple can make drawings and post them on Twitter. Even
though these users don’t have many followers, their draw-
ings are of very high quality and spread throughout Twitter
7. CASE STUDIES reaching many people. Other interesting users such as lo-
As we mentioned earlier, one important application of the cal politicians and political cartoonists are also found in the
IP algorithm is ranking users by their relative influence. In list. The IP-influence measure allows us to find interesting
this section, we present a series of rankings of Twitter users content posted by users who would otherwise be buried by
based on the influence, passivity, and number of followers. popularity rankings such as number of followers.
The most influential. Table 1 shows the users with the
most IP-influence in the network. We constrain the number
of URLs posted to 10 to obtain this list, which is dominated 8. DISCUSSION
by news services from politics, technology, and Social Media. Influence as predictor of attention. As we demon-
These users post many links which are forwarded by other strated in §5, the IP-influence of the users is an accurate
users, causing their influence to be high. predictor of the upper bound on the total number of clicks
The most passive. Table 2 shows the users with the they can get on the URLs they post. The input to the influ-
most IP-passivity in the network. Passive users are those ence algorithm is a weighted graph, where the arc weights
who follow many people, but retweet a very small percent- represent the influence of one user over another. This graph
age of the information they consume. Interestingly, robot can be derived from the user activity in many ways, even
accounts (which automatically aggregate keywords or spe- in cases where explicit feedback in the form of retweets or
cific content from any user on the network), suspended ac- “likes” is not available (§6).
counts (which are likely to be spammers), and users who Topic-based and group-based influence. The Influence-
post extremely often are among the users with the most Passivity algorithm can be run on a subpgraph of the full
IP-passivity. Since robots ”attend” to all existing tweets graph or on the subset of the user activity data. For ex-
and only retweet certain ones, the percentage of informa- ample, if only users tweeting about a certain topic are part
tion they forward from other users is actually very small. of the graph, the IP-influence determines the most influen-
This explains why the IP-algorithm assigns them such high tial users in that topic. It is an open question whether the
passivity scores. This also highlights a new application of IP algorithm would be equally accurate at different graph
the IP-algorithm: automatic identification of robot users in- scales.
cluding aggregators and spammers. Content ranking. The predictive power of IP-influence
The least influential with many followers. We have can be used for content filtering and ranking in order to
demonstrated that the amount of attention a person gets reveal content that is most likely to receive attention based
may not be a good indicator of the influence they have in on which users mentioned that content early on. Similarly,
spreading their message. In order to make this point more as in the case of users, this can be computed on a per-topic
explicit, we show, in Table 3, some examples of users who or per-user-group basis.
are followed by many people but have relatively low influ- Content filtering. We have observed from our passivity
ence. These users are very popular and have the attention experiments that highly passive users tend to be primarily
of millions of people but are not able to spread their mes- robots or spammers. This leads to an interesting extension
sage very far. In most cases, their messages are consumed of this work to perform content filtering, limiting the tweets
by their followers but not considered important enough to to influential users and thereby reducing spam in Twitter
forward to others. feeds.
The most influential with few followers. We are Influence dynamics. We have computed the influence
User name Category Rank by # followers Rank by IP-influence
thatkevinsmith Screen Writer 33 1000
nprpolitics Political News 41 525
eonline TV Channel 42 1008
marthastewart Television Host 43 1169
nba Sports 64 1041
davidgregory Journalist 106 3630
nfl Sports 110 2244
cbsnews News Channel 114 2278
jdickerson Journalist 147 4408
newsweek News Magazine 148 756

Table 3: Users with many followers and low relative influence

User name Category Rank by # followers Rank by IP-influence


cashcycle Retweet Contest 153286 13
mobiliens Retweet Contest 293455 70
jadermattos Twitdraw 227934 134
jaum Twitdraw 404385 143
robmillerusmc Congressional Candidate 147803 145
sitekulite Twitdraw 423917 149
jesse sublett Musician 385265 151
cyberaurora Tech News Website 446207 163
viveraxo Twitdraw 458279 165
fireflower Political Cartoons 452832 195

Table 4: Users with very few followers but high relative influence

measures over a fixed 300-hour window. However, the So- medium, regardless of their popularity.
cial Media are a rapidly changing, real-time communication
platform. There are several implications of this. First, the 10. REFERENCES
IP algorithm would need to be modified to take into ac- [1] Jure Leskovec, Lada A. Adamic and Bernardo A.
count the tweet timestamps. Second, the IP-influence it- Huberman. The dynamics of viral marketing. In
self changes over time, which brings a number of interesting Proceedings of the 7th ACM Conference on Electronic
questions about the dynamics of influence and attention. Commerce, 2006.
In particular, whether users with spikes of IP-influence are
[2] Bernardo A. Huberman, Daniel M. Romero, and Fang
overall more influential than users which can sustain their
Wu. Social networks that matter: Twitter under the
IP-influence over time is an open question.
microscope. First Monday, 14(1), Jan 2009.
[3] B. Jansen, M. Zhang, K. Sobel, and A. Chowdury.
9. CONCLUSION Twitter power: Tweets as electronic word of mouth.
Journal of the American Society for Information
Given the mushrooming popularity of Social Media, vast
Science and Technology, 2009.
efforts are devoted by individuals, governments and enter-
prises to getting attention to their ideas, policies, products, [4] Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty,
and commentary through social networks. But the very Zoran Despotovic, Wolfgang Kellerer Outtweeting the
large scale of the networks underlying Social Media makes Twitterers - Predicting Information Cascades in
it hard for any of these topics to get enough attention in Microblogs 3rd Workshop on Online Social Networks,
order to rise to the most trending ones. Given this con- WOSN, 2010)
straint, there has been a natural shift on the part of the [5] Jianshu Weng and Ee-Peng Lim and Jing Jiang and Qi
content generators towards targeting those individuals that He. TwitterRank: finding topic-sensitive influential
are perceived as influential because of their large number twitterers. WSDM, 2010.
of followers. This study shows that the correlation between [6] Fang Wu, Bernardo A. Huberman, Lada Adamic and
popularity and influence is weaker than it might be expected. Josh Tyler. Information Flow in Social Groups. Physica
This is a reflection of the fact that for information to prop- A, Vol 337, 327-335, 2004.
agate in a network, individuals need to forward it to the [7] Meeyoung Cha and Hamed Haddadi and Fabricio
other members, thus having to actively engage rather than Benevenuto and Krishna P. Gummadi. Measuring User
passively read it and cease to act on it. Moreover, since our Influence in Twitter: The Million Follower Fallacy. 4th
measure of influence is not specific to Twitter it is applicable International AAAI Conference on Weblogs and Social
to many other social networks. This opens the possibility Media (ICWSM), 2010.
of discovering influential individuals within a network which [8] Nitin Agarwal and Huan Liu and Lei Tang and Philip
can on average have a further reach than others in the same S. Yu. Identifying the Influential Bloggers in a
Community. WSDM, 2008.
[9] Sinan Aral, Lev Muchnik and Arun Sundararajan.
Distinguishing influence-based contagion from
homophily-driven diffusion in dynamic networks.
Proceedings of the National Academy of Sciences, Vol.
106 (51), pp. 21544-21549, 2009.
[10] Duncan J. Watts and Peter Sheridan Dodds.
Influentials, Networks, and Public Opinion Formation.
Journal of Consumer Research, Vol. 34 (4), pp.
441-458, 2007.
[11] Amit Goyal, Francesco Bonchi and Laks V.S.
Lakshmanan. Learning Influence Probabilities In Social
Networks. WSDM, 2010.
[12] P. Domingos and M. Richardson. Mining the network
value of customers. SIGKDD, 2001.
[13] S. Brin and L. Page. The anatomy of a large-scale
hypertextual Web search engine. Computer Networks
and ISDN Systems, Vol 30, 1-7, 1998.
[14] Jon Kleinberg. Authoritative sources in a hyperlinked
environment. Journal of the ACM 46 (5): 604 -632,
1999.
[15] Boyd Danah, Scott Golder, and Gilad Lotan. Tweet,
Tweet, Retweet: Conversational Aspects of Retweeting
on Twitter. HICSS-43. IEEE 2010.
[16] Jorge E. Hirsch. An index to quantify an individual’s
scientific research output. Proceedings of the National
Academy of Sciences 102(46): 16569 -16572, 2005.

You might also like