Out-of-India - From Theory to Truth: Part 2

RoyG · Post by **RoyG** » 02 Jan 2018 23:33

venug wrote:^^^ From the video posted by JEM ji, historical date of BG he gives is 1AD. Probably this is the date born out of the notion (wrong) that BG is influenced by Buddhism.

Good catch Venugji. Didn't watch the entire video. Just some quick conclusions I can draw from this:

This is a next level orientalism that is taking shape:

On the one hand he recognizes the futility of countering the likes of Adluri and Balu so he incorporates their critique of the Protestant polemic indirectly by recognizing that Itahasa isn't read like traditional myth. This way he can deny that his stance is an Orientalist deconstruction. However, he subsumes the approach within the larger Protestant story of liberation of masses from the Jews and later Catholics slyly by positioning the date after Buddhism thereby validating liberation philology. So in other words, the gita text was buddhist which was later corrupted by the cunning brahmanical elite through "theological" deconstruction and reconstruction of the text. This is a very sophisticated argument. It could just be that I'm reading too much into it but given the nature of the people we've been dealing with I doubt it.

SBajwa · Post by **SBajwa** » 03 Jan 2018 18:19

I was watching a speech of a Sikh preacher. He says that new year is mere birth of christ.

ramana · Post by **ramana** » 04 Jan 2018 02:52

How is that when Christ is a created figure?

SBajwa · Post by **SBajwa** » 04 Jan 2018 06:45

Just like hijri calendar is migration of Mohammad and Bikrami calendar is birth of Raja Vijramaditya few centuries before Jesus Christ.

The real new year is when we (humans) started counting the time aka we had sense to understand the passing of time. I think that Vedas , Ramayana and Mahabharta are the oldest known books and thus probably beginning of time.

Prem · Post by **Prem** » 04 Jan 2018 08:11

There is one big old timer Jatt Sikh Padre who call Christ Satguru and spread all kind of fudd and mudd.
OTOH, Any one have idea how old is reference to Kheer in Indian literature. Oldest one i found was when Dashrath did Yagya for Sons and Duryodhna told Lord Krishna that he had prepared Kheer for him while Vidhur had served him only the Saag .

A_Gupta · Post by **A_Gupta** » 04 Jan 2018 17:04

Prem wrote: OTOH, Any one have idea how old is reference to Kheer in Indian literature. Oldest one i found was when Dashrath did Yagya for Sons and Duryodhna told Lord Krishna that he had prepared Kheer for him while Vidhur had served him only the Saag .

K.T. Achaya in his "A Historical Dictionary of Indian Food" says rice cooked in milk dates to Vedic times. The quote is:

"In Vedic times, it {rice} was cooked with water to yield odana (later called bhatka and currently baath), or with milk to give kshira (now kheer) or with sesame seed and milk to yield krsara (perhaps a forerunner of the khichdhi of the present."

RoyG · Post by **RoyG** » 04 Jan 2018 22:52

Guys, just found out Rakhigiri DNA results will be out soon. So far based on what I've heard there is a reason to smile but not cheer yet. We will have to wait for the full results.

I think more than anything, these results will officially disprove the traditional Weitzel thesis which is good for us. Makes sense given the noise S Danyal and others having been making in attempt to pollute the waters before the full paper is released.

As the genetic/linguistic battle falls in our favor, the battleground has shifted to interpreting the texts themselves which we are playing catch up. Money, institution, and media control matter more than ever now.

Prem Kumar · Post by **Prem Kumar** » 04 Jan 2018 23:19

That's good news, RoyG. Is it paanwallah info?

Regarding text interpretation, Talageri stands heads & shoulders above the rest. His study of the internal evidence within Rig Veda proves Out of India comprehensively.

But granted: Swadeshi Indology is lacking when it comes to interpreting our own texts. The etic perspective dominates. This is what Rajiv Malhotra & our own BRF kshtriyas are trying to counter.

ramana · Post by **ramana** » 04 Jan 2018 23:39

Prem, You on Twitter? look up my TL. News is in hindi

RoyG · Post by **RoyG** » 04 Jan 2018 23:42

Its open source from hindi news. Talageri's thesis is preserved given the available information. Based on C-14 dating the Rakhigiri skeletons originate before 2500 BC. DNA closely matches today's North Indian Brahmin profile which means that mixing must've taken place prior to 1700 BC. This would mean that officially Weitzel's thesis has been disproved. Now I can't say anything for sure b/c the paper hasn't been released to the public. So we'll have to wait and see.

Prem · Post by **Prem** » 05 Jan 2018 00:33

ramana wrote:Prem, You on Twitter? look up my TL. News is in hindi

I read the teasing news . Full report should be put soon and if the team is attacked by usual suspects then it will only affirm that their hoax is exposed.

Prem Kumar · Post by **Prem Kumar** » 05 Jan 2018 00:37

Thanks Ramana & RoyG. I did a Google translate of that page: it wasn't a very good translation. It mentioned the North Indian Brahmin connection.

But was it a speculative article (like Tony Joseph's) or did it actually leak the results?

RoyG · Post by **RoyG** » 05 Jan 2018 01:06

Prem Kumar wrote:Thanks Ramana & RoyG. I did a Google translate of that page: it wasn't a very good translation. It mentioned the North Indian Brahmin connection.

But was it a speculative article (like Tony Joseph's) or did it actually leak the results?

Rajiv Malhotra mentioned that he got a little taste of the results while it was being put together. Said that it would be a game changer.

May have been a strategic decision to make recent swadeshi indology conf coincide w/ release date.

ramana · Post by **ramana** » 05 Jan 2018 01:17

I am waiting for another member to weigh in as he was eagerly waiting for a year or so.

A_Gupta · Post by **A_Gupta** » 05 Jan 2018 01:25

Dainik Jagaran:
https://www.jagran.com/news/national-ja ... 04852.html

A_Gupta · Post by **A_Gupta** » 05 Jan 2018 03:56

RoyG wrote:DNA closely matches today's North Indian Brahmin profile ...

What does "North Indian Brahmin profile" mean?
Here's one paper:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2755252/
The groups included in this study are classified as Shia, Sunni, Chaturvedis, Bhargavas, and Brahmins.

ramana · Post by **ramana** » 05 Jan 2018 03:59

No wonder all Muslims want to become Brahmins.

RoyG · Post by **RoyG** » 05 Jan 2018 04:00

A_Gupta wrote:Dainik Jagaran:
https://www.jagran.com/news/national-ja ... 04852.html

The AIT guys will try to spin as a victory b/c paper will prove that ANI came from outside. However, that is a position OOI agrees with them on. The issue is the date at which this happened and hence whether Vedic culture left India which this paper will show. If there is any sort of mixing prior to 1700 BC there is no stable architecture by which Weitzel can base his theory on so it is effectively dead. The first hit came from the endogamy study which proved that jati became solidified well after the establishment of vedic culture on the subcontinent.

A_Gupta · Post by **A_Gupta** » 05 Jan 2018 07:34

RoyG wrote:
A_Gupta wrote:Dainik Jagaran:
https://www.jagran.com/news/national-ja ... 04852.html
The AIT guys will try to spin as a victory b/c paper will prove that ANI came from outside. However, that is a position OOI agrees with them on. The issue is the date at which this happened and hence whether Vedic culture left India which this paper will show. If there is any sort of mixing prior to 1700 BC there is no stable architecture by which Weitzel can base his theory on so it is effectively dead. The first hit came from the endogamy study which proved that jati became solidified well after the establishment of vedic culture on the subcontinent.

Yes, the question is when did ANI arrive (in so far as ANI corresponds to a people, there is a danger of reification).

See what this paper says (emphasis added):

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5003663/

In South Asia, our dataset provides insight into the sources of Ancestral North Indians (ANI), a West Eurasian related population that no longer exists in unmixed form but contributes a variable amount of the ancestry of South Asians34,35 (Supplementary Information, section 9) (Extended Data Fig. 5). We show that it is impossible to model the ANI as being derived from any single ancient population in our dataset. However, it can be modelled as a mix of ancestry related to both early farmers of western Iran and to people of the Bronze Age Eurasian steppe; all sampled South Asian groups are inferred to have significant amounts of both ancestral types. The demographic impact of steppe related populations on South Asia was substantial, as the Mala, a south Indian population with minimal ANI along the ‘Indian Cline’ of such ancestry is inferred to have ~18% steppe-related ancestry, while the Kalash of Pakistan are inferred to have ~50%, similar to present-day northern Europeans.

Per the above ANI never corresponded to a people whose aDNA has been found so far. Perhaps ANI was never a people.

Agasthi · Post by **Agasthi** » 05 Jan 2018 07:47

https://www.thenewsminute.com/article/d ... oint-74201

by Nadika Nadja, seems to be an archeology student, however her linkedin profile says she is a script writer and activist.

More on her/him: About Nadika:

I am Nadika, and I am a writer and researcher. I am currently part of a research that’s looking at ideas and expressions of culture in religious sites across India, and one on caste, culture, and caste based discriminations, in temples of Tamil Nadu.

I also write about cinema, media, history (mainly urban history)

A_Gupta · Post by **A_Gupta** » 05 Jan 2018 07:47

Is Amber G around?
I think that if I take a sample of the order of a hundred people, and build a correlation matrix between them based on the order of 100,000 parameters (that is the order of number of SNPs compared), then since it is a symmetric matrix it corresponds to a very high dimensional ellipsoid which will have a longest and a second longest axes, these are the two principal axes, e.g., from which ANI and ASI are constructed. ANI and ASI have reality if and only if when the sample size is increased, the axes remain pretty much unchanged. Otherwise they are simply artifacts of the specific data.

Some mathematics gyaan would be most welcome.

shiv · Post by **shiv** » 05 Jan 2018 07:53

A_Gupta wrote:
Yes, the question is when did ANI arrive (in so far as ANI corresponds to a people, there is a danger of reification).

Check the paper by Reich et al 2009. Both ANI and ASI are dated more than 12,500 years old

A_Gupta wrote: See what this paper says (emphasis added):

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5003663/
In South Asia, our dataset provides insight into the sources of Ancestral North Indians (ANI), a West Eurasian related population that no longer exists in unmixed form but contributes a variable amount of the ancestry of South Asians34,35 (Supplementary Information, section 9) (Extended Data Fig. 5). We show that it is impossible to model the ANI as being derived from any single ancient population in our dataset. However, it can be modelled as a mix of ancestry related to both early farmers of western Iran and to people of the Bronze Age Eurasian steppe; all sampled South Asian groups are inferred to have significant amounts of both ancestral types. The demographic impact of steppe related populations on South Asia was substantial, as the Mala, a south Indian population with minimal ANI along the ‘Indian Cline’ of such ancestry is inferred to have ~18% steppe-related ancestry, while the Kalash of Pakistan are inferred to have ~50%, similar to present-day northern Europeans.
Per the above ANI never corresponded to a people whose aDNA has been found so far. Perhaps ANI was never a people.

You need to look at at a 2011 (?) paper by Metspalu who defines what corresponds to ANI as "k5" and ASI as "k6"

The K5 found in Pakistan is older than the little found in Iran and further west. Both k5 and k6 are more than 10,000 years old

I really think Harappa time is too too recent - it was just yesterday. If we are looking at language spread (which is what AIT is all about) we need to get back before 5000 BC. We are all barking up the wrong tree. Mark my words

I wish those Indology conference talks come online soon.

Vayutuvan · Post by **Vayutuvan** » 05 Jan 2018 11:45

A_Gupta wrote:Is Amber G around?
I think that if I take a sample of the order of a hundred people, and build a correlation matrix between them based on the order of 100,000 parameters (that is the order of number of SNPs compared), then since it is a symmetric matrix it corresponds to a very high dimensional ellipsoid which will have a longest and a second longest axes, these are the two principal axes, e.g., from which ANI and ASI are constructed. ANI and ASI have reality if and only if when the sample size is increased, the axes remain pretty much unchanged. Otherwise they are simply artifacts of the specific data.

Some mathematics gyaan would be most welcome.

( See next post)

Vayutuvan · Post by **Vayutuvan** » 05 Jan 2018 11:47

~~As such 100 people pair wise correlation matrix has 10k entries. Where are you getting 100k from? Unless you are looking at a least squares problem.~~

I re-read agupta's post again. It is block diagonal matrix with 100,000 blocks each of 100x100.

The dimension of the matrix is 10million x 10 million. Since it is block diagonal, the largest two eigenvalues and corresponding even vectors can be found fairly inexpensively (computationally speaking modulo any numerical problems like ill conditioning). Even more high eigenpairs can found as well.

But any visualization is futile. Visualization of four dimensional space is possible through ~~three~~ four projections onto the four 3D subspaces. Five dimensions probably is the limit for these kinds of projected visualizations.

Added later:

It need not necessarily be block diagonal but could have a strong block diagonal dominance so essentially can be treated as 100K decoupled problems, each of 100x100 size. Even the 100x100 need not be full. That said, even if it is full, each of the blocks is a very small problem.

The selection of those 100K attributes and interpretation of the results of the simulation should be left to those who are more evolutionary geneticists than mathematicians/statisticians (even if they are into mathematical biology).

I don't think this paper or similar approaches can throw any light on the existence of PIE. In case it really existed, say we find a 'Rosetta Stone' of sorts at some later date, whether the reconstruction is remotely close to the fevered imaginations of the historical linguists.

A_Gupta · Post by **A_Gupta** » 05 Jan 2018 17:42

shiv wrote: You need to look at at a 2011 (?) paper by Metspalu who defines what corresponds to ANI as "k5" and ASI as "k6"

This one?
https://www.sciencedirect.com/science/a ... 9711004885

PS: 500 generations ago is the latest that k5 could have entered India. Now these guys use 1 generation = 25 years which is where 12,500 years comes from. But what they really know from their simulations is number of generations which is 500.

We found no regional diversity differences associated with k5 at K = 8. Thus, regardless of where this component was from (the Caucasus, Near East, Indus Valley, or Central Asia), its spread to other regions must have occurred well before our detection limits at 12,500 years. Accordingly, the introduction of k5 to South Asia cannot be explained by recent gene flow, such as the hypothetical Indo-Aryan migration. The admixture of the k5 and k6 components within India, however, could have happened more recently—our haplotype diversity estimates are not informative about the timing of local admixture.

shiv · Post by **shiv** » 05 Jan 2018 17:54

A_Gupta wrote:
RoyG wrote:
The AIT guys will try to spin as a victory b/c paper will prove that ANI came from outside. However, that is a position OOI agrees with them on. The issue is the date at which this happened and hence whether Vedic culture left India which this paper will show. If there is any sort of mixing prior to 1700 BC there is no stable architecture by which Weitzel can base his theory on so it is effectively dead. The first hit came from the endogamy study which proved that jati became solidified well after the establishment of vedic culture on the subcontinent.
Yes, the question is when did ANI arrive (in so far as ANI corresponds to a people, there is a danger of reification).

See what this paper says (emphasis added):

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5003663/
In South Asia, our dataset provides insight into the sources of Ancestral North Indians (ANI), a West Eurasian related population that no longer exists in unmixed form but contributes a variable amount of the ancestry of South Asians34,35 (Supplementary Information, section 9) (Extended Data Fig. 5). We show that it is impossible to model the ANI as being derived from any single ancient population in our dataset. However, it can be modelled as a mix of ancestry related to both early farmers of western Iran and to people of the Bronze Age Eurasian steppe; all sampled South Asian groups are inferred to have significant amounts of both ancestral types. The demographic impact of steppe related populations on South Asia was substantial, as the Mala, a south Indian population with minimal ANI along the ‘Indian Cline’ of such ancestry is inferred to have ~18% steppe-related ancestry, while the Kalash of Pakistan are inferred to have ~50%, similar to present-day northern Europeans.
Per the above ANI never corresponded to a people whose aDNA has been found so far. Perhaps ANI was never a people.

The paper is very clear and not at all inconsistent from what has been reported earlier.

Indian populations have a mixture of ANI and ASI ranging from 60:40 in the north to 40:60 in the south (in general). Even tribals in India and south Indian language speakers have ANI while Pakis, Pathans etc all have ASI - which includes genes common to Andamanese and Onge tribals. We are talking about a thorough gene mix from Afghanistan to the tip of peninsular India and way off to the east. This is no migration that occurred in historic times as recently as 1000 BC. Recall that Tamil Sangam literature was already there just 700 years later and even fake AIT dates place Panini and Buddha around 500-600 BC. 400 years or 12-13 generations is hardly enough time for ANI genes to penetrate every corner of India including far south tribals living in impenetrable jungle.

AIT dates are wrong and as I have been pointing out tie ana again - genetics researchers searching for references find AIT refs cooked up by cunning linguists. But they are not going to ask how ANI genes went across all Indians in just 20 generations. That is not their mandate.

It is our duty to call out linguists bluff. It is high time we stopped being babies here and depending on geneticists to say something. Geneticists have said anough and they have no explanation for language. That is a linguistic bluff and we have to take up the burden of pointing out the inconsistency of genetics findings with the language spread theory

I repeat

THIS IS NOT ABOUT MIGRATION. IT IS ABOUT LANGUAGE

The migration was cooked up to lay claim on the language

A_Gupta · Post by **A_Gupta** » 05 Jan 2018 18:10

Ah, I get it, the Metspalu k5 component, whatever its origin, as he writes, was extant >=500 generations generations ago, or roughly 12,500 years ago. The Iranian and Steppes aDNA are from at best 10,000 years ago. Therefore trying to explain k5 as a descendant of Iranian and Steppes aDNA does not make any sense logically speaking. If k5 ~ ANI, then that explains things somewhat.

PS: exposing my confusion and ignorance about Metspalu v Laziridis on my blog:
http://arunsmusings.blogspot.com/2018/0 ... ng-to.html

PPS: we shouldn't equate genes to language; but (look at the diagram from Metspalu on my blog page) perhaps k5 in India was the original IE people, k5 masks the signal of spread of IE from India to Central Asia, and it is k4 that carried IE from Central Asia into Europe via invasion which is supposedly genetically and archaeologically attested to?

shiv · Post by **shiv** » 05 Jan 2018 19:13

A_Gupta wrote:
shiv wrote: You need to look at at a 2011 (?) paper by Metspalu who defines what corresponds to ANI as "k5" and ASI as "k6"
This one?
https://www.sciencedirect.com/science/a ... 9711004885

PS: 500 generations ago is the latest that k5 could have entered India. Now these guys use 1 generation = 25 years which is where 12,500 years comes from. But what they really know from their simulations is number of generations which is 500.

We found no regional diversity differences associated with k5 at K = 8. Thus, regardless of where this component was from (the Caucasus, Near East, Indus Valley, or Central Asia), its spread to other regions must have occurred well before our detection limits at 12,500 years. Accordingly, the introduction of k5 to South Asia cannot be explained by recent gene flow, such as the hypothetical Indo-Aryan migration. The admixture of the k5 and k6 components within India, however, could have happened more recently—our haplotype diversity estimates are not informative about the timing of local admixture.

More info here in a paper by Thangaraj - a co author of the original ASI/ANI Reich paper

Note "Late Pleistocene ancestry" of Indians - Pleistocene ended about 12000 years ago when Holocene started

AIT date is 3000 years ago. Balderdash

www.ias.ac.in/article/fulltext/jbsc/037/05/0911-0919

It is commonly believed that there was an Aryan invasion/
migration to India from the west. However, there is prolonged
debate on this topic. It has been well established that various
castes and tribal populations of India have a common late
Pleistocene maternal as well as paternal ancestry and minor
east and west Eurasian ancestries (Kivisild et al. 2003;
Metspalu et al. 2004, 2011; Sahoo et al. 2006; Sengupta et
al. 2006; Chaubey et al. 2007, 2008; Reich et al. 2009; Shah
et al. 2011; Sharma et al. 2012). Most of these studies
presumed that the detected west Eurasian genepool may be
the Aryan component. Interestingly, both the ANI and ASI
ancestry components of the Indian populations are found to
harbour higher haplotypic diversity than those predominant in
west Eurasia. The shared genetic affinity between the ANI
component of northern India and west Eurasia was dated prior
to the Aryan invasion (Metspalu et al. 2011). These realities
suggest the rejection of the Aryan invasion hypothesis but
support an ancient demographic history of India.

shiv · Post by **shiv** » 05 Jan 2018 19:17

The image below analyses how ASI and ANI originated

An out of Africa group went on to become ancestors of Andamanese. But this line branched off twice. The first branch split into two again - with one line going to Europe and another line going to India creating ANI

The other branch of the Andamanese group went direct to India as ASI

shiv · Post by **shiv** » 05 Jan 2018 19:20

This image shows the early peopling of India
https://drive.google.com/file/d/0B3JNY4 ... sp=sharing

A_Gupta · Post by **A_Gupta** » 05 Jan 2018 20:11

With regard to some more recent papers ( via my blog: http://arunsmusings.blogspot.com/2016/0 ... india.html ) I believe the above diagram might be extended like this (the topology is correct, but the absolute positions of junctions and lengths of legs do not reflect time-depth). E.g., the AmeriIndian-East Asian split is likely 24K years ago. (https://news.nationalgeographic.com/201 ... ia-genome/ )

Rudradev · Post by **Rudradev** » 06 Jan 2018 00:18

As a caution to all the above inference, I want to bring up one of the oldest puzzles in phyllogeography.

Genetic data by itself offers NO way to distinguish between two quantities called divergence time and coalescence time. These quantities are both periods of time, but distinguishing how much of observed genetic variation is attributable to each of them is absolutely critical to coming up with meaningful inferences from genetic data.

Divergence time for a pair of populations is the time before present at which they both separated from a common ancestral population. In reality this type of separation would probably have taken centuries if not millennia to completely occur. In many of the mathematical models used in phyllogeography, however, they are considered to have happened instantaneously. Which is itself a problem. But anyway.

Coalescence time for a group of individuals carrying a copies of a certain allele (a certain DNA marker, such as a Y-SNP) observed WITHIN a population is the time to the most recent common ancestor for all those individuals who carried that particular allele.

Given genetic sequence analysis for a single allelic locus (a particular physical location in some particular chromosome), there is absolutely NO way to tell how much of a given quantum of variation between two individuals is attributable to divergence time, vs. coalescence time. You can make assumptions, build mathematical models based on those assumptions, and try to finesse those models in such a way that looking at more markers (or allelic loci) gives you a better chance of estimating currently the extent to which divergence time and coalescence time each played a role. But they are only assumptions.

Let me give an example. Take the genomes of three individuals: Arun, Shiv, and myself. Arun I assume is a "northie". Shiv and I hail from different regions of the great state of Karnataka.

So, in terms of coalescence, Shiv and I may very well have a more recent common ancestor than Arun and I. It's a small Karnataka, after all.

However, my ancestors were Saraswats who left Northern India and meandered southwards, making many stops along the way, arriving at last at the Konkan and then finally during the Portuguese inquisition hot-footing it to Dakshina Kannada.

For the sake of this story I am going to assume (again, sorry) that Shiv is not a Saraswat and that a predominant section of his lineage came to Karnataka at some totally different time, perhaps long before mine did... and perhaps from some totally different part of India than mine did.

Now let's say Arun's ancestors (assuming yet again) have been in the north of India as long as anyone can remember and his lineage has remained North Indian until his birth.

So in terms of divergence, my ancestors may have diverged from Arun's ancestors much later than Shiv's ancestors from Arun's ancestors.

Thus coalescence may be (Arun-Rudradev, Arun-Shiv >>> Rudradev-Shiv) while divergence might be more like (Arun-Rudradev <<< Arun-Shiv, Rudradev-Shiv).

However, a standard genetic sequencing test performed on all three of us, and application of standard frequentist statistical analysis to the results, will reveal only that Shiv and I have a more recent common ancestor for a particular allelic locus than Arun and I. [Aside: Just for fun, if Shiv speaks Kannada, I speak Konkani, and Arun speaks Hindi as his mother tongue, consider the obvious pitfalls of relating genetic inferences to language-family distribution.]

One way phyllogeography people try to get around this is to test tons of individuals: say all BRFites, or even all Indians. They also try to look at more allelic loci, using variations among all the genealogies tested at the different loci to try and disentangle divergence time vs. coalescence time as contributors to the observed overall variation. This may involve looking for sharp peaks in an observed-variation-vs-time plot (suggesting divergence if you treat it as instantaneous), or it may involve trying to construct tree topologies along the genome in an attempt to characterize the ancestral coalescence process within an assumed population, so that the remainder of observed variation can be (they hope) attributed to divergence between populations. However, at the end of the day it's just computationally-expensive guesswork.

The trouble with testing huge numbers of individuals and looking at multiple allelic loci is that the noise-to-signal ratio also becomes amplified (among other things, because of a problem known as homoplasy: multiple individuals might share the same allele at a certain locus, but might have evolved it through entirely different evolutionary paths, and may not in fact have a common ancestor with that allele, yielding a false positive).

The standard process then is to employ brute force matrix-giri. Just as Arun said, sort all the data along the two vectors associated with the largest eigenvalues, and try to match the categorical data distribution you get (if one is clearly visible at all) with "other" evidence (archaeological, linguistic, "historical" etc.) See how well it fits. If it looks embarrassing, pick the next pair of vectors and try again (if you look carefully at "methods" sections of many of these papers, you can see how arbitrary the authors can sometimes become with their choices of principal components).

So, yes indeed. Just about all the literature in this field expands historically in the direction of looking at more allelic loci in a given study (including first short-tandem-repeats, then Y- or mt-DNA SNPs, then autosomal DNA loci, in ever increasing numbers); testing more individuals, made possible by such techniques as high-throughput sequencing; and using more public-domain information (such as data from the 1000 genomes project) as a basis for constructing prior probability distributions against which to assess your data in a Bayesian manner.

Therefore, as Arun correctly observed, categories like "ANI" and "ASI" are simply principal-component-based distributions of the data obtained by looking at some lakhs of autosomal loci in some 100s of individuals. Some believe them to supersede the earlier constructs obtained by merely looking at dozens of Y-chromosomal loci in smaller numbers of individuals, such as "R1a1" and what have you. (This is a simplification, of course; Y-chromosomal data has its own peculiarities, advantages, and disadvantages and isn't strictly comparable to autosomal data for many reasons). Anyway, better techniques for sequencing and crunching data will lead to further (perhaps better) definition of principal components and ANI, ASI may very well become obsolete in their time.

The fact that estimates for the arrival of "ANI" in India vary from 12,500 years ago to 45,000 years ago demonstrate how very far away we are from the kind of studies that might actually resolve the difference between divergence time and coalescence time (32500 years is a pretty big margin of waffling by any standards).

What does all this mean for "aDNA"? Well, the Phyllogeography of Europe paper linked by Arun earlier gives some idea. The number of individuals you can sequence is sooooo small that you are very often left waving your hands about divergence vs. coalescence (ummm, er, haplogroup G2a is common, but F is rare, and F shows up in India but let's talk about Proto-Indo-European languages instead).

A_Gupta · Post by **A_Gupta** » 06 Jan 2018 01:16

^^^ Thanks, Rudradev, I'm going read very carefully what you wrote until I've understood it fully.

Vayutuvan · Post by **Vayutuvan** » 06 Jan 2018 01:26

shiv wrote: The migration was cooked up to lay claim on the language

Excellent.

Vayutuvan · Post by **Vayutuvan** » 06 Jan 2018 01:35

Rudradev, one quick question. Is it possible to exhume bodies (do they even exist) of people from say 25,000 years back at intervals say 1000 years from different locations of interest? Can the divergence/coalescence profile over time be reconstructed?

Finding the prinicipal eigenpairs of 25 problems of this size is also not all that time consuming. More samples have to be taken around any epoch where the smoothness is lost.

Nilesh Oak · Post by **Nilesh Oak** » 06 Jan 2018 02:02

Rudradev,

Question for you.

In this paper by Karmin, Monika, et al http://genome.cshlp.org/content/25/4/45 ... 4b7105a9b4

The Paragraph (page 461) just before "Discussion' section talks of reduction in male Ne (at around 8-4 kya) when female Ne was 17 fold higher during this time.

My nooby question.. what parameters are they using (I hear they employing BSP- Bayesian skyline plots) to estimate these Ne for male and female especially for an ancient time interval? Any metaphor/allegory to explain how they go about estimating it for ancient timeline...in the absence of aDNA?

Appreciate your help,
(If you look at the supplementary material, they have visuals (figures) drawn for populations in the various parts of the world

Nilesh

Rudradev · Post by **Rudradev** » 06 Jan 2018 02:10

Vayutuvan wrote:Rudradev, one quick question. Is it possible to exhume bodies (do they even exist) of people from say 25,000 years back at intervals say 1000 years from different locations of interest? Can the divergence/coalescence profile over time be reconstructed?

Finding the prinicipal eigenpairs of 25 problems of this size is also not all that time consuming. More samples have to be taken around any epoch where the smoothness is lost.

Vayutuvan, the general rule of thumb is more: you work with what you find. So it's not as if you have your pick of individuals to draw a sample from, particularly as regards aDNA. Typically they are outliers who happened to be well enough preserved that you can isolate nucleic acid from what's left of their tissues, and hopefully purify it into something recognizably human.

Also, it's generally a little more technical than finding eigenpairs might be in other situations. The statistical technique commonly used is something called MCMC(Markov Chain Monte Carlo) Coal. Here is a youtube video with a guy giving a fairly accessible presentation on how they used it on a small scale to distinguish between coalescence and divergence times for a study of San (Kalahari Bushman) and Bantu genomes.

https://www.youtube.com/watch?v=TkvzrpYkOzY

Vayutuvan · Post by **Vayutuvan** » 06 Jan 2018 02:27

RD, thanks.

As I presumed, finding eigenpairs are secondary to Phylogenetics. Mathematicians/physicists/ComputerSci/Computational Sci people can help only in solving once the modeling is done. Interpretation again has to be carried out by the modeling geneticists because they are the ones who know what assumptions they used while modeling in addition to the limitations of the chosen model. A model (mathematical or otherwise), by definition, is an approximation of the phenomenon/process/physical reality that is being modeled/represented using (mostly) differential/partial differential equations. Error bounds play an important role as well.

Rudradev · Post by **Rudradev** » 06 Jan 2018 03:09

Nilesh ji,

First we need to look at what this Ne actually means.

Some of the fundamental equations of population genetics are based on something called the Hardy Weinberg principle. This makes various assumptions, such as: mating opportunities for all individuals in the population are equal, no mutation occurs (hence no natural selection), no migration occurs (no introduction of new alleles from immigration or loss of alleles from emigration), no genetic drift occurs, etc. Given all these conditions, Hardy Weinberg principle says that the alleles for each locus in the entire genome will reach equilibrium frequencies that are calculable based on their initial frequencies in the population.

But of course in the real world these things do occur! Migration and mutation increase the diversity of alleles (population heterozygosity) over time; genetic drift, natural selection, founder effects etc. will decrease diversity of alleles over time.

So, much of the rest of population genetics is based on introducing various estimators on top of the Hardy Weinberg equations to correct for all these things, and then refining the estimators as more and more observations of data prove to fit (or not fit) the population parameters that would be expected based on their existing definitions.

Ne the "effective population size" is one parameter that changes when you consider a non-ideal population.

In an "ideal population", where:
1) Number of males and females are equal, and all are equally able to reproduce,
2) All individuals are equally likely to produce offspring, and the number of offspring that each produces varies no more than expected by chance,
3) All mating is random, and
4) The number of mating individuals does not change from one generation to the next (steady state population),

Ne exactly equals the census population size, the total number of mating-age individuals.

But no population is ideal! For example, what if one sex outnumbers the other? In this case it can be shown that Ne = 4NmNf/(Nm + Nf), where Nm is the number of males and Nf is the number of females.

So what is the relationship of Ne to the type of data that Karmin et al are analyzing? To cut a long story short, a scientist called Kingman showed that "coalescence time" for a particular allelic locus in a population is mathematically related to the Ne at the time when all the ancestral lineages sharing that particular allele at that locus "coalesced" into a single individual ancestor.

Thus Ne becomes a factor in many Bayesian estimates of time to most recent common ancestor (TMRCA), which is related to coalescence time for a particular allelic locus. There are both frequentist methods and Bayesian methods to calculate TMRCA. Frequentist methods rely on something called the "infinite alleles hypothesis", which makes them prone to over- or under-estimating TMRCA based on whether a greater or smaller number of allelic sites are studied, respectively. Bayesian methods are not prone to this problem, but they do require you to make assumptions about what the Ne was in order to construct a posterior probability distribution for TMRCA, as well as the mutation rate for each allele under consideration.

In this paper by Walsh (2001), he provides what is an often used method for determining the Bayesian posterior distribution for TMRCA
http://www.genetics.org/content/genetic ... 7.full.pdf

I cannot post mathematical equations here, but the term lambda in the equations on p 889 is defined as (Ne)^-1. These equations are used to estimate TMRCA from the inputs n (observed score of markers for each allelic locus studied between two individuals), mu (estimated mutation rate per locus), and lambda (the inverse of Ne).

Thus, TMRCA estimates calculated for data like that of Karmin et al, using statistical software like this, https://taming-the-beast.github.io/tuto ... ine-plots/ can be used backwards to infer Ne. That is what the Bayesian skyline plots in the paper you linked are showing. They used coalescence times for alleles on the 320 Y chromosomes they sequenced to infer contemporaneous Ne for males, and coalescence times calculated for alleles on the mtDNA samples they sequenced to infer contemporaneous Ne for females.

It is interesting to note that, again looking the other way, there are no fewer than EIGHT different Bayesian methods currently in use to estimate TMRCA values! See https://www.nature.com/articles/ejhg2015258 for details. The gist of it is, ALL these Bayesian methods rely on some assumption or other to be made by the investigator regarding Ne that do not work equally well for all population demographic models. So in essence, the TMRCA value you get by these methods depends on what you assume happened to the population, historically/demographically speaking.

You can see the perils of this process if scientific rigor is not carefully applied and racist constructs like AIT, PIE etc. are taken to be the gospel truth. If your initial assumptions about Ne, based on prior assumptions about migration, invasion, civilizational collapse, population bottlenecks, founder effects etc. are bullshit, then the TMRCA value you get will be bullshit. The age of coalescence you derive for whatever allelic locus you are studying, be it M780 or Z93 or whatever, will be bullshit. Then if that age of coalescence is cited by other papers and used in other studies to reconstruct parallel but related population demographic models, those will also be bullshit. And so it goes, from Michael Bamshad to Martin Richards to Tony Joseph.

A_Gupta · Post by **A_Gupta** » 06 Jan 2018 03:20

^^^ some public course-work here: https://quizlet.com/77587591/coalescenc ... ash-cards/
"Coalescence Theory and the Genealogy of Genes"