Scott D. White

From InterSciWiki
Jump to: navigation, search

Thurs 11PM Oct 13-17 Alaska 496 10:18 PM Hayden-Claire Joan Friday

  • Scott 12243 Dayton North Seattle 98133 - Scott-206 384 9439 Katie-206 303 7776 / Catherine Keefe
Catherine Keefe Holloween Birthday 30th Oct

9010 Phinney Ave N, Seattle, WA 98103


Sent you in Facebook messenger but copying here: “The figure below is one of the most amazing demonstrations of the power of mathematics I've ever seen.

Here's how it was made (explanation requires university-level linear algebra to fully understand). About 1000 people of European ancestry had blood samples taken, and sequenced for about 200,000 genetic markers. This produces a matrix of size 1000 x 200,000.

From this matrix, produce a covariance matrix: multiply the matrix times its transpose to get a 1000 x 1000 matrix. Now plot a histogram of the eigenvalues. If the original big matrix of data were totally random, this histogram would follow a precise law known as the Marchenko-Pastur distribution: one of the cornerstones of random matrix theory.

What you actually see is practically all of the eigenvalues follow the random matrix distribution perfectly, except for two very prominent outliers. Each of these two outlier eigenvalues has an associated eigenvector of dimension 1000 (one entry for each subject in the study); so putting these two eigenvectors together gives a coordinate in 2-dimensional space for each subject.

Below is the plot of those 1000 points in the plane. They are also colored according to the country of origin of the subject's grandparents (which the researchers also noted when collecting the data, but did not use in the analysis until after the plot was made).

As you can see: purely genetic information amazingly accurately encodes the geographic information of the subjects' ancestors. THERE'S A MAP OF EUROPE HIDDEN IN THEIR GENES. And it's amazingly detailed: if you look closely, the genes even encode, geographically precisely, whether Swiss subjects' ancestors came from French speaking, German speaking, or Italian speaking regions of Switzerland.

Genes mirror geography within Europe This was published in Nature in 2008

I only learned about this last week, in a phenomenal talk by Stanford statistician and former MacArthur Fellow David Donoho. His talk launched from here, using even more sophisticated random matrix theory to explain finer details of the map (it's not perfect: Italy and Spain are too wide, and we now understand exactly why, thanks to the analysis of spiked covariance random matrix models recently completed by some of my colleagues).

It is amazing that this information is encoded in our genes. What's even more amazing is that finding this signal there in the data is the result of some very sophisticated mathematics, in the field I work in: a field which was born out of a failed attempt to explain the strong nuclear force in physics, and has been developed largely in a purely theoretical world for decades. Now, it is at the heart of signal processing, wireless communications, and (apparently) population genetics.

If you ever need a reason why we (mathematicians) do what we do, this is it. Not that we're looking for these connections in our work every day. But the fact that sophisticated mathematics leads to such amazing real-world discoveries vindicates every overly-abstract paper we write.”


Now at SalesForce.

Former senior architect at Radar Networks, working on Semantic web construction. I'm also a PhD student at UC Irvine, advised by Padhraic Smyth. I have worked as an applied scientist at Amazon and Yahoo!. I've also worked as well as a manager, consultant, and software engineer in the areas of bioinformatics, military simulation, derivatives arbitrage, and credit card scoring.

Former principal scientist at Visible Technologies designing new ways to make sense of the billions of conversations happening on the web. I also provide consulting services around data analytics and intelligent systems

SalesForce Analytics


  1. S. White, P.Smyth, Algorithms for Discovering Relative Importance In Graphs. KDD, Washington D.C., 2003.
  2. S. White, P.Smyth, Algorithms for Estimating Relative Importance In Networks. Journal of Intelligence Community Research and Development, 2004.
  3. Joshua O'Madadhain, Danyel Fisher, Scott White, and Yan-Biao Boey. The JUNG Framework. Technical Report UCI-ICS 03-17.
  4. Scott White, Padhraic Smyth, Markov Chain Algorithms for Determining Relative Importance in a Graph, Technical Report UCI-ICS 04-25.
  5. S. White, P. Smyth; A Spectral Clustering Approach To Finding Communities in Graphs. SIAM Data Mining Conference, 2005.
  6. Joshua O'Madadhain, Danyel Fisher, Padhraic Smyth, Scott White, and Yan-Biao Boey. Analysis and Visualization of Network Data Using JUNG.
  7. Douglas R. White, Natasa Kejzar, Constantino Tsallis, Doyne Farmer, Scott White. A Generative Model For Feedback Networks. Physical Review E 016119 2006 Final paper in pdf as SFI working paper.

Matlab code contributed by Scott D. White

Human Kinship Project

How many people project


Katie White Claire Sofia White Hayden S. White Stockholm