Community detection
Surprise maximization
http://en.wikipedia.org/wiki/Surprise_(networks)
"the cohesion'" (triangles)
Adrien Friggeri, Guillaume Chelius, Eric Fleury. 2011. Triangles to Capture Social Cohesion. Third IEEE International Conference on Social Computing, Oct 2011, Cambridge, United States. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6113123
Peer-reviewed conferences/proceedings
Adrien Friggeri, Guillaume Chelius, Eric Fleury. 2011. Complex Dynamics of Human Interactions, Sep 2011, Vienna, Austria.
Adrien Friggeri, Guillaume Chelius, Eric Fleury. 2011. Egomunities, Exploring Socially Cohesive Person-based Communities. NetSci 2011 The International School and Conference on Network Science, Jun 2011, Budapest, Hungary.
Adrien Friggeri, Guillaume Chelius, Eric Fleury. 2011. [ Fellows: Crowd-sourcing the evaluation of an overlapping community model based on the cohesion measure]. Interdisciplinary Workshop on Information and Decision in Social Networks, May 2011, Cambridge, United States. video
Adrien Friggeri, Eric Fleury. 1994. Maximizing the Cohesion is NP-hard. CoRR abs/1109.1994: (2011)
Aux: Chris Griffin. 2011. Graph theory. Seminar
fuzzy
Tamás Nepusz. 2008. Fuzzy communities and the concept of bridgeness in complex networks. PRE
Surveys
Anna Goldenberg, Alice X Zheng, Stephen E Fienberg, Edoardo M Airoldi 2009 A Survey of Statistical Network Models
Community detection by Surprise
Rodrigo Aldecoa, Ignacio Marín. 2011. Deciphering Community Structure by Surprise. PLoS ONE 6(9): e24195.
Spinglass detection (maximum contrast) diagonal blockmodels
As for the iGraph package, I've spoken with Gabor who maintains it at the NetSci Conference last year. He has implemented the community detection (diagonal block model) algorithm as spinglass.community. Your email reminds me that I should correspond with him about implementing the full block modeling approach.
- Joerg Reichardt
Possibility of implementing a generalization of (maximum contrast) diagonal blockmodels in iGraph to Reichardt-White 2008 blockmodels
I think both you, Luke, and Laurent, could use a generalization of the diagonal block model (igraph program) to the maximum block-contrast model that Joerg wrote for our article. He is busy with a conference at the moment, and working on publication of a new algorithm but maybe a generalization will be is feasible in the not-too-distant future either by Joerg or by Gabor at the request of Joerg.
Joerg, what are your thoughts on this?
A small correction to the spinglass.community algorithm http://igraph.sourceforge.net/doc/R/spinglass.community.html was noted by V.A. Traag and Jeroen Bruggeman, Community detection in networks with positive and negative link, Phys. Rev. E 80, 036115 (2009) https://bugs.launchpad.net/igraph/+bug/438554
Maybe we could ask one of them to work with Gabor on implementing the generalization to blockmodels if you are too busy.
Laplacian dynamics
Renaud Lambiotte, J.-C. Delvenne, M. Barahona Laplacian Dynamics and Multiscale Modular Structure in Networks arxiv: (Submitted on 9 Dec 2008]
- All the codes are available on http://www.lambiotte.be/codes.html
Peter J. Mucha1, Thomas Richardson2, Kevin Macon, Mason A. Porter4, and Jukka-Pekka Onnela 2009 Community Structure in Time-Dependent, Multiscale, and Multiplex Networks
Fast computation
Vincent D Blondel et al 2008 Fast unfolding of communities in large networks J. Stat. Mech. P10008
- Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre 2008 Fast unfolding of communities in large networks
Walktrap
WalkTrap is a community detection program written in C++ by Pascal Pons. It is based on the fact that a random walker tends to be trapped in dense parts of a network corresponding to communities.
Clustering
Clustering with Bregman Divergences Use VPN. Abstract: A wide variety of distortion functions, such as squared Euclidean distance, Mahalanobis distance, Itakura-Saito distance and relative entropy, have been used for clustering. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroid-based parametric clustering approaches, such as classical kmeans, the Linde-Buzo-Gray (LBG) algorithm and information-theoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the method to a large class of clustering loss functions. This is achieved by first posing the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by rate distortion theory, and then deriving an iterative algorithm that monotonically decreases this loss. In addition, we show that there is a bijection between regular exponential families and a large class of Bregman divergences, that we call regular Bregman divergences. This result enables the development of an alternative interpretation of an efficient EMscheme for learning mixtures of exponential family distributions, and leads to a simple soft clustering algorithm for regular Bregman divergences. Finally, we discuss the connection between rate distortion theory and Bregman clustering and present an information theoretic analysis of Bregman clustering algorithms in terms of a trade-off between compression and loss in Bregman information.
In soft clustering, each data point has a certain probability of belonging to each of the partitions.
This method does soft and hard clustering, hard being a {0,1} probability.
Keywords: clustering, Bregman divergences, Bregman information, exponential families, expectation maximization, information theory
Community structure
New: 2006 [http://sitemaker.umich.edu/michael.blum/files/0604016.pdf A mean-field analysis of community structure in social and kin networks] Eric Durand, Michael G.B. Blum, Olivier Fran¸cois. TIMB Department of Mathematical Biology, TIMC UMR 55
There exists a vast literature in the social sciences about various definitions of such cohesive groups. See Wasserman and Faust: Social network analysis for an overview.
Survey article on community detection: Communities in Networks by Mason A. Porter, Jukka-Pekka Onnela, and Peter J. Mucha]] (arXiv:0902.3788)
NetworksLiterature»Community Detection (incomplete list, and I know because I compiled it :) --- Mason)
new: V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. JSTAT, in press, 2008. http://arxiv.org/abs/0803.0476
new: J. Reichardt and S. Bornholdt. Clustering of sparse data via network communities - a prototype study of a large online market. J. Stat. Mech., page P06016, 2007.
new: "Algorithm finds the network-for genes" - J. Ruan and W. Zhang, Identifying network communities with a high resolution, Physical Review E, 77:016104, 2008.
missed: H. Zhou and R. Lipowsky. Network brownian motion: a new method to measure vertex-vertex proximity an to identify communities and subcommunities, pages 1062–1069. Springer- Verlag Berlin Heidelberg, 2004.
Modularity Q splitting or hierarchical clustering algorithm
One property of real world networks is their modular structure: certain groups of nodes are more densely interconnected internally than with the rest of the network. Often, they are referred to as "communities" or "cohesive subgroups" or simply "clusters". To find these is generally a hard combinatorial optimization problem. The "search space" is the set of all possible partitions of the network in groups, i.e. the set of all subsets, grows faster than exponential with the size of the network.
edge betweenness pre-print Community structure in social and biological networks, PNAS USA 99, 7821-7826 (2002) (15 pages)
http://arxiv.org/abs/cond-mat/0308217 Finding and evaluating community structure in networks, M. E. J. Newman and K. Girvan, Phys. Rev. E 69, 026113 (2004).
Fast algorithm for detecting community structure in networks Newman 2004 preprint Phys. Rev, 2004, E 69, 066133.
Modularity and community structure in networks, M. E. J. Newman, Proc. Natl. Acad. Sci. USA 103, 8577-8582 (2006).
Finding Community Structure in Mega-scale Social Networks Ken Wakita, Toshiyuki Tsurumi 2007
A Normalized and a Hybrid Modularity Haifeng Du, Douglas R. White, Yike Ren, Shuzhuo Li
Jordi Duch and Alex Arenas. 2005. Community detection in complex networks using extremal optimization Phys. Rev. E. 72(2) 027104 [4 pages]
Evolution of Degree
very fast Newman 2007
Mixing Patterns and Community Structure
http://www-personal.umich.edu/~mejn/
Newman M E J, Girvan K. 2003. Mixing patterns and community structure in networks. Pastor S, Rubi J, Diaz-Guilera A. Statistical Mechanics of Complex Networks. Berlin: R. Springer.
Betweenness and Community Structure
preprint Newman, M.E.J. and K. Girvin 2002 http://www.pnas.org/cgi/reprint/99/12/7821
Modularity Q agglomerative algorithm optimization
A. Clauset, "Finding local community structure in networks." Physical Review E 72, 026132 (2005).
A. Clauset, M. E. J. Newman and C. Moore, "Finding community structure in very large networks." Physical Review E 70, 066111 (2004).
Discovering Functional Communities in Dynamical Networks Authors: Cosma Rohilla Shalizi, Marcelo F. Camperi, Kristina Lisa Klinkner
Detection of Complex Networks Modularity by Dynamical Clustering Authors: S. Boccaletti, M. Ivanchenko, V. Latora, A. Pluchino, A. Rapisarda
Thermodynamics of Community Structure Authors: Claire P. Massen, Jonathan P. K. Doye]
Modularity Q Genetic algorithm optimization
[13] Tasgin, M.,"Community Detection Model using Genetic Algorithm in Complex Networks and Its Application in Real-Life Networks", MS Thesis, Graduate Program in Computer Engineering, Bogazici University, 2005
Spin Glass
This (part of the) article is heavily biased towards the work of J. Reichardt. Pls. improve by adding more references and discussions. ;-) Joerg
After the 2002 edge-betweeness paper by Girvan and Newman ("Community structure in social and biological networks", PNAS USA 99, 7821-7826 (2002) pre-print) quite some interest arose especially among physicists in the problem and there has been a surge of algorithms on how to detect communities.
An analogy to the magnetic systems known from physics, socalled spin glasses, can be used to solve the problem. Consider the links of the network as attractive couplings of spins and the missing links as repulsive couplings. Then search for the configuration in which minimizes the energy and you will find that the spins on densely connected nodes will align in the same direction while different, losely interconnected groups of spins will be oriented in different directions. With this technique, you can analyse systems of millions of nodes.
Here are some example applications:
Analysis of the energy landscape of a folding protein:
J. Reichardt and S. Bornholdt "Detecting fuzzy community structures in complex networks with a Potts model", Phys. Rev. Lett. 93, 218701 (2004) pre-print
Analysis of the interests of the bidders on eBay:
J. Reichardt and S. Bornholdt "Clustering of sparse data via network communities—a prototype study of a large online market", J. Stat. Mech. (2007) P06016, pre-print
An important problem using these techniques is that even completely random networks can be partitioned in a way that looks as if there were cohesive subgroups. This was first shown numerically by R. Guimera et al in "Modularity from Fluctuations in Random Graphs and Complex Networks", Phys. Rev. E 70, 025101 (2004) pre-print. However, the analytical estimates they present are inaccurate.
Here are some references on how to distinguisch between "true" and "spurious" modularity:
J. Reichardt and S. Bornholdt "Statistical Mechanics of Community Detection" Phys. Rev. E 74 (2006) 016110 pre-print
J. Reichardt and S. Bornholdt "When are networks truly modular?" Physica D 224(1-2) (2006) 20-26 pre-print
J. Reichardt and S. Bornholdt "Partitioning and modularity of graphs with arbitrary degree distribution" Physical Review E, 76(1) (2007) 015102 pre-print
Limits of modularity optimization
Modularity compares the number of edges inside a cluster with the expected number of edges that one would find in the cluster if the network were a random network with the same number of nodes and where each node keeps its degree, but edges are otherwise randomly attached. This random null model implicitly assumes that each node can get attached to any other node of the network. Such assumption is however unreasonable if the network is very large, as the horizon of a node includes a small part of the network, ignoring most of it. Moreover, this implies that the expected number of edges between two groups of nodes decreases if the size of the network increases. So, if a network is large enough, the expected number of edges between two groups of nodes in modularity's null model may be smaller than one. If this happens, a single edge between the two clusters would be interpreted by modularity as a sign of a strong correlation between the two clusters, and optimizing modularity would lead to the merge of the two clusters, independently of the clusters' features. So, even weakly interconnected complete graphs, which have the highest possible density of internal edges, and represent the best identifiable communities, would be merged by modularity optimization if the network is sufficiently large. For this reason, optimizing modularity in large networks would fail to resolve small communities, even when they are well defined. This bias is probably inevitable for methods like modularity optimization, which rely on a global null model.
Santo Fortunato and Marc Barthelemy, "Resolution limit in community detection" Proceedings of the National Academy of Science of the USA 104, 36-41 (2007).
Cluster overlap
“Improving the Clustering Capabilities of Consensus” 2006 Lecture Notes in Computer Science abstract “This paper introduces SOM as a clustering method to evaluate complex and uncertain knowledge in Consensus, a distributed security system for vulnerability testing; it proposes new metrics to evaluate the cohesion of every cluster, and also the cohesion between clusters; it applies unsupervised algorithms and validity metrics to a security data set; and it presents a method to obtain the best number of clusters regarding these new cohesion metrics: Intracohesion and Intercohesion factors.”
Moody and White 2003
Self-organizing network community
Self-Organizing Network Evolving Model for Mining Network Community Structure Book Series Lecture Notes in Computer Science Publisher Springer Berlin / Heidelberg ISSN 0302-9743 (Print) 1611-3349 (Online) Volume Volume 4093/2006 Book Advanced Data Mining and Applications Copyright 2006
See also
Structural cohesion -- Intercohesion -- Clique percolation -- Mark Newman -- Haifeng Du
( Community structure was absorbed into and now points to Community detection )
