  • Leghorn Merchant Networks - Antenati Livornesi (95,798 persons)
  • Project to save the Livorno cemeteries
  • This is an academic research project trying to reconstruct foreign merchant families networked with Livorno (Leghorn) but also connecting with other Mediterranean ports like Smyrna, Aleppo, Tunis, Gibraltar, Genoa, Marseille, Naples, Messina, Malta...and with main business centers like London, Frankfurt, Hamburg, Milan, Paris, Philadelphia, Boston...(and also Turin)
The database includes also thousands of french huguenots and other foreigners established in Geneva, London, Amsterdam, Hamburg, Bordeaux and many other places.

Our respective home pages

Image 2. These are the structurally endogamous families, from Image 1.

Tue, January 26, 2010 3:07 pm Matteo

Hello Doug,

Sorry for being late. I am having some hard moments lately but I am still doing my research. Just let me know what is going on and if you come to something interesting with the analysis on my database. It will probably be necessary in the future to rearrange the data file to a more consistent one without spurious branches and mistakes.


Monday, January 18, 2010 10:40 PM Doug

Subject: Good news

I have a new server now that doesnt freeze up at the size of your Livorno netweorks. Since I have to give a networks talk a month from now that will probably mean that I make some more progress finally, in the figures and findings....

happy new year, best wishes

Doug White

Thu, November 12, 2009 11:42 pm Matteo

Doug, I am afraid we are not understanding each other on this. When I talked about the multiple ascendants, it was about a graphical tree that my software can produce. There's an option there, to draw or not the "implèxe". I found a wikipedia page in English on this:

If I trace these ancestors multiple times or just one time doesn't change the database itself. The ancestors are there. I am not sure of getting the meaning of "consolidated", can you explain to me better what you mean ?

Did you check which people correspond to those two dots or male lines you were telling me ?

Doug: Those are the Royals.

About the Royals... I am very confused on how to proceed. There's no easy way to manage them. I am a little afraid of just telling the program to delete the ascendants of Louis XVIII of France or others... there could be some important links between people passing through them. Couldn't we, instead, just cut off all the individuals before a certain date like 1300 A.D.?

Doug: Thats easy in the Pajek graph, which is ordered precisely by the directed acyclic graph ordering mentioned in the Wikipedia:Pedigree collapse article. Each level has a number and Pajek can show those numbers, then we can delete all the upper generations. The "dots" in the graphs are nodes for couples, so will have a link to wife's parents and separately to husband's parents above, then to sons and daughters or their marriages below. We dont delete from the GEDcom but from one version of the *.net or *.paj file, and there can be many such files with different views of the network.


P.S. did I tell you that I posted an article on our cooperation on my blog ?

Doug: Thanks, I just viewed it at

Wed, November 11, 2009 7:34 PM Doug

Subject: Re: GED file fixed & attached

To take a timeout on my work - The Pajek plots are strictly ordered by time. No problem with the Royals, but in the GEDCOM that you sent and I had fixed, are these accendants still there multiple times, are are they consolidated so we can see the intermarriages? The latter is better, but no problem the other way, they can be consolidated.

Fri, November 6, 2009 1:35 pm

Hi Doug,

I tried to understand your question but I am afraid I need first some understanding of what pajek actually does in relation to the logic of a genealogical structure. I also have the feeling that the "top" of the graph (Royal lines added for personal interest; Matteo also mentions ancestors there "multiple times") has no relationship to time. It would be better if I could have the pajek analysis file so that I can try to check the dots. So for now I am not able to answer to your question. I'd like also to understand if pajek is able to visualize data in a more chronological order, is there a way to make it trace, for example, a selected descendancy in a sort of "tree" graph ? Is it able to represent for example a descendancy tree of one individual where, for each descendant, it takes into consideration all his ascendants? (this is to represent for example the ascendant trees of all the wives of all descendants of a given individual).

As to your other question, the dates of the earliest ancestors, the answer is quite ridiculous: a few years ago I entered, for some kind of personal enjoyment and curiosity, the ascendants of some royal lines thus taking the database back to the carolingians and capetingians... this forces the answer to be: about 400-500 A.D. - If we want a more serious dataset, we can cut those lines and let the database go back just until 1100-1200 A.D. which is a little more consistent with possible documents (notarial only of course). As to specific Livorno merchants' ancestors, the most common dates to which I get after some research, I would say, is late 1500 A.D.


Sun, November 1, 2009 9:43 am

That's it for a while for images and analysis as I have to get back to my teaching, grading, writing and conferences schedule. I put a reply to a question of yours in indented format below. By now you should have the new gedcom format readable in Pajek and an answer from A.M. as to why your GED exports from the (French) Heredis 10 Pro software (from BSD Concept) doesnt export correctly.

Sun, November 1, 2009 7:57 am - 2nd Image: The 16,351 endogamous families (1/3rd) of the Livorno kinship network

This is an amazingly large percentage of endogamous families: fully 1/3rd of the total.

Image 1. Each vertical column is a patriline, the dotted arrows are maternal lines in a P-graph, with 58,308 families. The 75 horizontally colored circles are minimal temporal units needed to keep parents just above children and ancestors above descendants.

Sun, November 1, 2009 7:40 am - First Image: The lineages and intermarriages of 58,308 families of Livorno

To A.M.: The amazing thing is that when I did a 3-D image with generational layers in the z direction from generational partition I got not only a perfect image but ALL THE MALE LINEAGES WERE PERFECTLY VERTICALLY ALIGNED !!! THAT IS NEW AND A REAL ACHIEVEMENT IN YOUR VISUALIZATION SOFTWARE.

Sun, November 1, 2009 2:49 am - Help from A.M.

A truly collaborative enterprise. Works in Pajek. Thanks to A.M. Now we can use Pajek to make graphs.

---------------------------- Original Message ----------------------------


Thanks for good wishes. I hope you and Lilyan are well too, and that you had good time in Seattle.

Non numerical characters are not allowed as identifiers for individuals and families in Pajek.

INDI tags in your file were like: 0 @1I@ INDI I changed them to 0 @1@ INDI

FAM tags were like: 0 @2432U@ FAM I changed them to 0 @2432@ FAM

Corrected file is attached.



> I would so much appreciate if you could help me once again to read a giant > gedcom... It gets thru a first series of 88000 then in the second series > ends and the error is family names or some such ... error like - genealogy > could be wrong and ends without inputting data into Sept 2009 Pajek. > > best to you and family > Doug > >

Sat, October 31, 2009 5:26 pm Matteo follow-up and Doug's answers

Hello again,

How does this matching program work ? John's project looks extraordinary and I am astonished that I never heard of it in Italy. Did people in Italy give him the necessary attention or it happened like with you and Lille faculties ?

Mostly I would guess like me and the Lille faculties -- quite legitimately, too busy with their own work to add more. I happen to have found some methods and shortcuts that make the computational work easy but the genealogical data collection and background data collection is hard and requires deeper motivation.
That's understandable up to a certain degree. I believe even if people are busy with some other work, the concept of having digitized an entire population for a couple of centuries... it's not a work like another one... it's extraordinary and I think there should be a strong interest on it ! You are also saying a big truth about the ultra-strong motivation needed for large scale genealogical data entry... I can tell it on my own experience... (Matteo)
Yes, I agree, it's not a work like any other and has immense lasting historical value, as John Padgett is aware, and changes history forever. I spent about six months on my wife's Austrian village genealogical data, the same on the Turkish nomads' data, and the same for Nord-Pas-de-Calais where I put each family into geocoordinates to do spatial as well as network analyis. Each of these databases and many others is online so should not be lost even if I dont publish on some of them.

I also had some contacts with Italian professors and faculties, and apart an initial apparent interest there was no action or further communication.... I really don't get it much. I have even been told to keep the database as secret as possible since any student/professor could use it to publish articles at my place... well, my goal would be to have it published on the web, inside an academic website or sponsored by a faculty and open to all with different access to researchers and public so to make it evolve with the contributions of all, does this seem crazy or without interest ?

For some irrational reason (easily understandable given the general academic paranoia perhaps), academics like to keep their data secret, even long after they have published on it. Scientists are different -- after the initial publication(s), within 1-2 years, having published, the canon of science is that you research should be replicable, hence they release the data in some form.

I'd like to know your opinion on these websites and graphical engines:
Both require Java or JDK expertise. Looks great, these are my son's fortes not mine.
Its beautiful, its postmodern, 3d, dynamical, and by one of the 21 heroes of visualization but its also suggestive rather then analytical. Requires you to hire the artist/programmer. Its also quite egocentric -- one persons genealogy moving through a forest of trees. A forest of trees doesnt even capture how genealogy is organized.
This is only of many dozens of SNA programs, this by Greek computational mathematicisn Dimitris V. Kalamaras. For SNA (social network analysis) I usually go with what I know.
This is the trendiest of all the visualizations, a really brilliant guy, lots of kudos, who has a scientific mind and follows popularizers in complexity like Barabasi whose work unfortunately is partly fraudulent when it concerns his book on scale-free networks. Here is little here but alot of glitz and flash but little real substance scientifically. There is little in any case that applied to genealogy.

I guess there's a need of marking some families/trees to be able to analyse and visualize differences and patterns.

None of these people or their software get it -- kinship, genealogy, exchange networks have particular structures and dynamics, not one but many types. We need to go down our own path, and study what these are historically and figure out the reality and historicity of all that you and others have collected in this project. Here is time for pretty pictures later. Better to keep it real. Thanks for sending the GEDCOM: once you see what we can do with it once it is debugged you'll see how time and complexity is built in.


Sat, October 31, 2009 4:52 pm Matteo

That's simply great. Until I saw it I was not sure about posting the e-mails but I am already fond of it ! I just hope there will be some benevolence about my less than perfect English...

Let's get into work then. I have extracted a "clean" GEDCOM from my database: I cut off all individual/family research notes but there are other fields that may be cleaned up as well like a "private" tag on living people and some other kinds of tags of which I am not very knowledgeable. Tell me if this ged is clean enough or if it needs some more treatments. (One note: I am working everyday on the database so it happens that I add or correct few hundreds people each month, how do we manage this ?)

I exported ALL individuals and families, even those not connected to others but included in the database for particular reasons or simply waiting for a connection. A particularity of the original database are non-genealogical links, for example I created some fake family names like (dipl), (loge), (soc)... with some individuals belonging to that fake surname like "consuls" or "Lodge Great Orient of France"... these represent societies, masonic lodges, establishments, diplomatic professions... and they were made so that I could try to link multiple individuals to this one fake individual. Other non-genealogical links include life events for which my software can create a link to somebody else who was present at the event or who is somehow involved in it; moreover there are also kinship links which are not yet materially established: I know A is uncle of B but I still didn't determine how, so I can put a link between them of uncle/nephew type.

I am not sure if these information are helpful or necessary but I though so, tell me if there's anything else you need to know. Oh, by the way, the text is in ANSEL format, but I can also export in other formats like MsDos, Mac, ANSI, etc...

Towns, cities, villages are not always standardized. My software recently updated includes now a place database to help out in standardizing locations but it has only France, Canada and some french territories. In many cases I standardized also british, italian, american cities, etc... but there are some inconsistencies or places which are not yet standardized which are momentarily treated with a "subdivision" field normally used to add a detail to the location, like a church or cemetery name, or a "lieu-dit". Profession and Title fields are very often used as if they were the same due to the architecture of the program which does not show both fields during a fast data entry but it does during normal data entry so this caused this inconsistency.

Lastly: there must be some substantial difference on how gedcom treats families in relation to my other database software because I noticed these differences in the two data sets before and after export: individuals: original (90702), GED (90443) families: original (35791), GED (41377) I think this may be related to some individuals missing one parent and the GED export adding the missing one as an unknown individual. (?)

Size (just for info): original about 70Mb, GED without notes about 20Mb, zipped less than 4Mb.

P.S. it happened in the past that I also added some Kings/Queens and other members of royal families for different reasons. These families are far from being complete on purpose


Sat, October 31, 2009 3:09 pm Doug

I put our emails on the wiki, feel free to edit out or edit whatever you want or erase the whole think but its easier to keep track of what we are saying to each other that way, without personal contact.

Could you send me the complete GEDCOM for the whole project, Livorno, Huguenot and the rest? The larger the better. We can handle 100s of thousands of individuals or families. I will not distribute any of it of course, it remains yours entirely but I can begin analysis of the network structure and visualization.

Sat, October 31, 2009 1:03 pm Matteo

Hello Doug,

  • I already made a "blog" about the project about 10 days ago: I would be very happy if we use that one as well.

We can exchange information on your wiki but if there's something relevant I'd love to post it on the other blog. Let me know what you think, this is the first time I do a kind of explanation of my project.

I'll get to your first e-mail after dinner...


Sat, October 31, 2009 12:46 pm Doug follow-up

Matteo (I go by first names)

I find it useful in collaborations to use my open-access wiki site, start a page like

and we can jointly post and edit this and related pages and links, conceivably others can join in although that is not so important. The wiki has instructions on how to edit and post urls as well (the latter is easier for me to do as sysop). I could post relevant parts of our mails to establish the initial communications that ground the potentials for collaboration. It is much easier for me to edit the wiki than to write emails that get lost and disorganized in the course of time. If you agree, I will do an initial post, you can delete or edit or add new test as you like. I have no need for privacy in such sites (I have many such collaborations underway) as nobody has the knowledge to "scoop" an ongoing collaboration and if they did so our priority of open acces publication precedes them.

Doug White

Sat, October 31, 2009 12:35 pm Doug

Good news for me too. I can certainly help with getting the GEDCOM export to be read in Pajek and getting the data in shape to be analyzes productively in Pajek and elsewhere.

I wrote an expert genealogical matching program to move John Padgett's project from 12,000 marriages to 98,000 as part of a Santa Fe Institute project. These were only choices among likely alternatives as to who were the father and mother of whom. As cited in Economic Credit and Elite Transformation in Renaissance Florence (p.23 fn 53: "Douglas White kindly wrote a computer matching algorithm that assisted in this linkage task, during our collaboration at the Santa Fe Institute, for which we thank him. This task is complicated by the fact that names are often not consistent across archival sources. Currently there are 1660 family genealogies in the dataset, viewable through Pajek"), John took it from there to compare candidate lineages containing potential relatives and determine the ascertainable kinship links. In Organizational Genesis, Identity and Control: The Transformation of Banking in Renaissance Florence he cites White and Jorion (1992) on my methods for visualizing the reconstructions of the Florentine genealogies, e.g., for computing structural endogamy and with other kinds of ties, structural cohesion (cohesive blocking). John spend years in the Florentine archives extracting all of the censuses through the 1550's.

My problem with the Lille data was failure to inspire any of the Lille sociology or anthro faculty to collect the ethnographic and background data to go with the massive genealogical database. To do a useful project requires some intimate knowledge of the historical context which is what you have and makes it worth collaborating.

> Is it possible to visualize the entire network using some color-coding?

Yes, combining your intimate knowledge of what to code and mine on the network analytics.

> In your experience what do you think are the best ways of representing > these kind of networks ? By countries, by nationality, by time-spans?

Too much to explain here, all of the above and more, you should review my home page "pdfs" and check the kinship articles.
It is the genealogical structure and dynamics through time that is particularly exciting scientifically as related to all these other variables through time.

> Hello, > wow that's some great news, I am really happy about your interest. > I am also very curious about your past project on Florence and Lille. > That's a lot of data, how did you get the Florence individuals for those > centuries? Did you use the Florence Census of 1427 ? (do I remember well the date> ?)

Yes. There are lots of them, John transcribed them all for the Medieval period.


Fri, October 30, 2009 5:23 pm Matteo

Hello, wow that's some great news, I am really happy about your interest. I am also very curious about your past project on Florence and Lille. That's a lot of data, how did you get the Florence individuals for those centuries ? Did you use the Florence Census of 1427 ? (do I remember well the date ?)

My data is originally written in hr10 format which is a proprietary format of the Heredis Pro 10 software from BSD Concept, one of the most used french softwares for genealogy. It has a few more fields than a GEDCOM and some more possible links between individuals. Of course I do export the data in GEDCOM format, usually when I upload it to the geneanet servers. I tried to use the Pajek software but it has problems with the GEDCOMS I extract from my data... I still don't know why (and I still have big problems in understanding the logic behind Pajek).

I am completely excited about the possibilities of visualizations for a large scale genealogy, I would ask you thousands of questions but maybe I'll just let you tell me some of them so I can address better my imagination and develop some ideas.

Is it possible to visualize the entire network using some color-coding ? In your experience what do you think are the best ways of representing these kind of networks ? By countries, by nationality, by time-spans ?

Looking forward to hearing from you soon,

Matteo Giunti

Fri, October 30, 2009 11:36 am Doug

I would be happy to collaborate. We reconstructed 98,000 or so links for 12th-15th C Florence (w John Padgett) and I did so for 95,000 for Lille 19-20th C elites, American presidents data, European royslyies, and Hapsburgs on that scale. My software works nicely on large datasets. I take it your data is in GED format? Easy to read in that case.

October 27, 2009 2:58 pm Matteo: first contact

Hello Dr. White, I read with interest your page about P-graph and social network analysis. I have a strong interest in this subject because I am the author of a very large database of huguenot families from around 1550 to our days. The main project started cataloguing most of the foreign merchants established in the city of Leghorn, Italy during 1650-1800 but soon I found out that there were many more links behind and I wanted to get deeper into the subject. I am not going to disturb you with too many details on my first e-mail, so I will just tell you that the database now contains about 90,000 people. Most of the people have french origin but there are many nationalities covered. The interesting thing is that a lot of these people emigrated and formed big networks abroad. Some of these networks are already known to historical researchers due to their intimate relationship with the formation of the Bank of England, the Banque de France and many smaller establishments and societies.

I think this database should seriously undergo some heavy analysis and used to produce some interesting graphs and visualizations. I am not able, alone, to complete this task so I am trying to find people able to help me directly or by addressing me to the right people and software to use.

I thank you very much for your attention, hoping to hear from you soon.


Matteo GIUNTI Rome, Italy

P.S. basic data is online at the geneanet website (user: alivornesi)

pure experiment