Nilesh ji,

First we need to look at what this Ne actually means.

Some of the fundamental equations of population genetics are based on something called the Hardy Weinberg principle. This makes various assumptions, such as: mating opportunities for all individuals in the population are equal, no mutation occurs (hence no natural selection), no migration occurs (no introduction of new alleles from immigration or loss of alleles from emigration), no genetic drift occurs, etc. Given all these conditions, Hardy Weinberg principle says that the alleles for each locus in the entire genome will reach equilibrium frequencies that are calculable based on their initial frequencies in the population.

But of course in the real world these things do occur! Migration and mutation increase the diversity of alleles (population heterozygosity) over time; genetic drift, natural selection, founder effects etc. will decrease diversity of alleles over time.

So, much of the rest of population genetics is based on introducing various estimators on top of the Hardy Weinberg equations to correct for all these things, and then refining the estimators as more and more observations of data prove to fit (or not fit) the population parameters that would be expected based on their existing definitions.

Ne the "effective population size" is one parameter that changes when you consider a non-ideal population.

In an "ideal population", where:

1) Number of males and females are equal, and all are equally able to reproduce,

2) All individuals are equally likely to produce offspring, and the number of offspring that each produces varies no more than expected by chance,

3) All mating is random, and

4) The number of mating individuals does not change from one generation to the next (steady state population),

Ne exactly equals the census population size, the total number of mating-age individuals.

But no population is ideal! For example, what if one sex outnumbers the other? In this case it can be shown that Ne = 4NmNf/(Nm + Nf), where Nm is the number of males and Nf is the number of females.

So what is the relationship of Ne to the type of data that Karmin et al are analyzing? To cut a long story short, a scientist called Kingman showed that "coalescence time" for a particular allelic locus in a population is mathematically related to the Ne at the time when all the ancestral lineages sharing that particular allele at that locus "coalesced" into a single individual ancestor.

Thus Ne becomes a factor in many Bayesian estimates of time to most recent common ancestor (TMRCA), which is related to coalescence time for a particular allelic locus. There are both frequentist methods and Bayesian methods to calculate TMRCA. Frequentist methods rely on something called the "infinite alleles hypothesis", which makes them prone to over- or under-estimating TMRCA based on whether a greater or smaller number of allelic sites are studied, respectively. Bayesian methods are not prone to this problem, but they do require you to make assumptions about what the Ne was in order to construct a posterior probability distribution for TMRCA, as well as the mutation rate for each allele under consideration.

In this paper by Walsh (2001), he provides what is an often used method for determining the Bayesian posterior distribution for TMRCA

http://www.genetics.org/content/genetic ... 7.full.pdfI cannot post mathematical equations here, but the term lambda in the equations on p 889 is defined as (Ne)^-1. These equations are used to estimate TMRCA from the inputs n (observed score of markers for each allelic locus studied between two individuals), mu (estimated mutation rate per locus), and lambda (the inverse of Ne).

Thus, TMRCA estimates calculated for data like that of Karmin et al, using statistical software like this,

https://taming-the-beast.github.io/tuto ... ine-plots/ can be used backwards to infer Ne. That is what the Bayesian skyline plots in the paper you linked are showing. They used coalescence times for alleles on the 320 Y chromosomes they sequenced to infer contemporaneous Ne for males, and coalescence times calculated for alleles on the mtDNA samples they sequenced to infer contemporaneous Ne for females.

It is interesting to note that, again looking the other way, there are no fewer than EIGHT different Bayesian methods currently in use to estimate TMRCA values! See

https://www.nature.com/articles/ejhg2015258 for details. The gist of it is, ALL these Bayesian methods rely on some assumption or other to be made by the investigator regarding Ne that do not work equally well for all population demographic models. So in essence, the TMRCA value you get by these methods depends on what you assume happened to the population, historically/demographically speaking.

You can see the perils of this process if scientific rigor is not carefully applied and racist constructs like AIT, PIE etc. are taken to be the gospel truth. If your initial assumptions about Ne, based on prior assumptions about migration, invasion, civilizational collapse, population bottlenecks, founder effects etc. are bullshit, then the TMRCA value you get will be bullshit. The age of coalescence you derive for whatever allelic locus you are studying, be it M780 or Z93 or whatever, will be bullshit. Then if that age of coalescence is cited by other papers and used in other studies to reconstruct parallel but related population demographic models, those will also be bullshit. And so it goes, from Michael Bamshad to Martin Richards to Tony Joseph.