Estrada's subgraph centrality, communicability, community detection

From InterSciWiki

Jump to: navigation, search

Contents

[edit] Author

Ernesto Estrada

[edit] MatLab programs by

James McNerney

[edit] Comissioned by

Douglas R. White for fall 2009 talk for NAACSOS meeting

[edit] Reference

Kolaczyk, Eric D. Statistical Analysis of Network Data.

See pp 106-forward Spectral decomposition, becomes harder with more components.

[edit] Quik_Math

http://meta.wikimedia.org/wiki/Help:Formula

[edit] Subgraph centrality

CS(i) = eλ For steps 3 to 6 below (Copy and paste to MatLab)
load Zachmat.dat
A = Zachmat;
[V,lambda] = eig(A); %1 Compute the eigenvectors and
lambda = diag(lambda); %2 eigenvalues.
V2 = V.^2; %3 Matrix of squares of the eigenvectors elements.
Cs = real(V2 * exp(lambda)); %4 Compute eigenvector centrality. Lop off imaginary part remaining due to precision error.
Cs

[edit] Comparison to Cohesive blocking

Compare cohesive blocks in the Zachary karate club to subgraph centrality.


[edit] Completed McNerney code

6 September 2009: I've added fuller matlab code below for several of the quantities used in Ernesto Estrada's papers on communicability and subgraph centrality. --James McNerney

Copy and paste the code but not the heading labels!

[edit] Subgraph centrality Cs

Estrada, Ernesto, and Juan A. Rodriguez. 2005. Subgraph centrality in Complex Networks. Phys. Rev. E 71 (2005) 056103. 9 pp.

C_S(i) = \sum_{k=0}^{\infty} \frac{(\mathbf{A}^k)_{ii}}{k!}
%%function Cs = sg_centrality(A)
% SG_CENTRALITY(A) computes the subgraph centralities of the network 
% represented by the adjacency matrix A. It returns a vector of subgraph
% centralities for each node of the network (as they are ordered in the
% adjacency matrix.)
%%%Example of data entry
load Zachmat.dat
A  = Zachmat;


[V,lambda] = eig(A);                   % Compute the eigenvectors and
lambda     = diag(lambda);             % eigenvalues.
V2         = V.^2;                     % Matrix of squares of the eigenvectors elements.
Cs         = real(V2 * exp(lambda));   % Compute eigenvector centrality. Lop off imaginary part remaining due to precision error.
Cs

[edit] Communicability G

Estrada, Ernesto, and Naomichi Hatano. 2008. Communicability in Complex Networks. Phys. Rev. E 77 036111. 12pp. G_{ij} = \sum_{k=0}^{\infty} \frac{(\mathbf{A}^k)_{ij}}{k!}

%%function G = communicability(A)
% COMMUNICABILITY(A) computes the communicability of pairs of nodes in the
% network represented by the adjacency matrix A. It returns a matrix whose
% elements G(i,j) = G(j,i) give the the communicability between nodes i and
% j.

G = expm(A);    % Compute the matrix exponential of A.
G

[edit] ΔG Overlapping community detection

%%function DG = Delta_G( A )
% DELTA_G(A) computes the matrix "delta G" (a matrix related to the
% communicability matrix) for the network with adjacency matrix A. Finding
% cliques in the signed version of this matrix is equivalent to finding
% communities in the original adjacency matrix.

[phi,lambda] = eig(A);
n            = length(A);
phi1         = phi(:,n);      % 'eig' orders eigenvalue in increasing order,
lambda1      = lambda(n,n);   % so the nth eigenvalue is the largest
first_term   = phi1 * phi1' * exp(lambda1);  % Matrix of first terms in the decomposition of G.
G            = expm(A);                      % The communicability.
DG           = G - first_term;
DG

[edit] Homogeneity ξ

%%function xi = homogeneity(A)
% HOMOGENEITY(A) computes Estrada's measure of homogeneity for a network
% with adjacency matrix A.

% Compute Cs_odd values for all nodes.
[phi,lambda] = eig(A);                    % Compute the eigenvectors and
lambda       = diag(lambda);              % eigenvalues.
phi2         = phi.^2;                    % Matrix of squares of the eigenvectors elements.
Cs_odd       = real(phi2 * sinh(lambda)); % Compute eigenvector centrality. Lop off imaginary part remaining due to precision error.

% Compute xi.
n            = length(A);
phi_1        = abs(phi(:,n));    % The eigenvector of the largest eigenvalue.
lambda_1     = lambda(n);        % The largest eigenvalue.
residuals    = log10(phi_1) - 0.5*log10(Cs_odd) + 0.5*log10(sinh(lambda_1));
xi           = sqrt( sum(residuals.^2)/n );
xi

[edit] Plot - Run Homogeneity first!

%%% Plot %%%
% Compute vectors for plotting line of perfectly homogeneous network.
Cs_low  = min(Cs_odd);
Cs_hi   = max(Cs_odd);
Cs_plot = [Cs_low Cs_hi];
b       = sinh(lambda_1)^(-.5);

% Plot.
loglog(Cs_odd, phi_1, 'o')
hold on
plot( Cs_plot, b*Cs_plot.^(.5), 'k--')
hold off
set(gca, 'FontSize',16)
xlabel('C_s^{(odd)}(i)')
ylabel('phi_1(i)')
title( ['\xi(G) = ',num2str(xi)] )

[edit] Community detection

Estrada, Ernesto, and Hatano, Naomichi. 2009. arXiv Communicability Graph and Community Structures in Complex Networks.

According to Kolaczyk, Eric D. 2009 Statistical Analysis of Network Data. Methods and Models Series: Springer Series in Statistics pp. 106-108, the spectral decomposition only works under some case of simple components and the relation to Green's function for spring networks is not proven. But the relation to the cycles is interesting. --- Michael Koenig

Estrada's algorithm for community detection:

  1. Calulate ΔG matrix (see code above).
  2. Map negative elements of ΔG to 0, and positive elements to 1. This produces the adjacency matrix of the communicability graph associated with the original graph.
  3. Find cliques in the communicability graph. These cliques correspond to communities of the original graph.

[edit] Example: Zachary's karate club

See last part of : NAACSOS plenary talk

Visualizations produced using fr_graph_viz.m by Scott White. These reproduce the two graphs in Estrada and Hatano's Fig. 2. (James McNerney: For the figures I uploaded to the wiki, I modified Scott's code to produce larger node dots and larger node labels. I can tell you how to this if you want to do likewise; a good general rule to do this is to look for lines of code containing property names like "MarkerSize" and "FontSize" as arguments, and then fiddle with their property values (which follow right after the property name.)


Sample code to produce these figures:

load Zachmat.dat
A = Zachmat;               % Original graph
A_cg = (delta_G(A) > 0);   % Communicability graph
figure(1)
fr_graph_viz(A)
set(gca, 'Visible','off')
set(gca, 'Position',[0 0 1 1])
# A_cg = (DG > 0);  
figure(2)
fr_graph_viz(A_cg)
set(gca, 'Visible','off')
set(gca, 'Position',[0 0 1 1])

[edit] Notes on the spring embedder

James McNerney: fr_graph_viz detects how many parameters are passed in using a built-in function of matlab called 'nargin', standing for 'number of arguments in'. If you pass only A to fr_graph_viz, it will detect that only the first argument was passed in, and assign default values to the others. Effectively the function is overloaded; and should run with any of the following variants

fr_graph_viz(A)
fr_graph_viz(A,labels)
fr_graph_viz(A,labels,display) ...
fr_graph_viz(A,labels,display,width,height,threshold,maxIterations)
fr_graph_viz(A,labels,display,width,height,threshold,maxIterations,fixNodePos)

So passing in A only is fine; the other parameters will be defined by operations inside of fr_graph_viz. The more elaborate versions of the function are made available in case you want to override the default values of the parameters, but fr_graph_viz(A) should work fine. I didn't change the other parameters when I generated the figures for the wiki (though as I said I did reach inside the code and modify some parameters that weren't made optional, like the node dot size.)

--For the figures I uploaded to the wiki, I modified Scott's code to produce larger node dots and larger node labels. I can tell you how to this if you want to do likewise; a good general rule to do this is to look for lines of code containing property names like "MarkerSize" and "FontSize" as arguments, and then fiddle with their property values (which follow right after the property name.)--

[edit] Commentary

Discussion: Michael Koenig and Doug White: The Estrada index takes the diagonal

C_S(i) = \sum_{k=0}^{\infty} \frac{(\mathbf{A}^k)_{ii}}{k!}


On the other hand the Bonacich centrality can be decomposed in diagonal and off-diagonal components. The diagonal component takes into account the cycles of a node while the off-diagonal component measures all outgoing paths from a node not coming back to that node.

C_B(i) = \left(\sum_{k=0}^{\infty} \Phi^k \mathbf{A}^k \mathbf{u} \right)_{i}  = \sum_{k=0}^{\infty} \left( \underbrace{\Phi^k \left(\mathbf{A}^k \right)_{ii}}_{\text{cycle term}} +  \underbrace{\sum_{j \ne i} \Phi^k \left(\mathbf{A}^k \right)_{ij}}_{\text{out-paths}} \right)

See also this decomposition in Yves Zenou (Stockholm University and IFN), Lesson 3, Games on Networks

  • Michael: An idea would be to take a dataset like the one in this study of peer effects with effects decomposed by the first and second terms of the equation above: Peer Effects and Social Networks in Education Antoni Calvó-Armengol, Eleonora Patacchini, Yves Zenou. 2006 Review of Economic Studies (2009) 76, 1239–1267.
e.g., run a logistic equation for the dependent variable (educational outcome-school performance; alternately link formation) estimating the coefficients in the two components of the eqn. above.
  • Doug: There might also be a middle term for those ij paths that do not terminate in the same community, as computed from Estrada's community detection, main components.
  • Michael: Could use data from alliance networks of firms 1980-now, two datasets Thompson SDC, other RECAP, similar to Powell's biotech, patent data with measure of performance, or can construct an inventor network (2-mode data on patents and inventors). Targets Links or Productivity. See Tatarinowicz data (not published).
  • If it is easier to do productivity outcomes, patent productivity outcome is knowledge, then number of patents ("degree distribution") might be a good outcome to predict with regression, if distribution is power-law then dependent variable is log(#patents) per inventor. Possible data: Tatarinowicz; Jason Owen-Smith. Worldwide Patent data base. CompSci group at Zurich taking care of the database.

For longer cycles see Dynamics of R&D Collaborations in IT industry - Nabuku Hanaki, with Ryo Nakajima and Yoshiaki Ogura, under review.

See also: Ranjay Gulati SRN and length of cycles

^Network analysis: methodological foundations By Ulrik Brandes, Thomas Erlebach

Chapters on random walk betweenness (Newman) and closeness (Stephensen and Zelen)
Google: ulrik brandes random walk centrality