Why
the correlation of individual gene variants with IQ is apparently so weak?
The starting point of the following
considerations is the fact that the 10 most intelligent dog breeds are morphologically
quite different from each other. Since intelligence depends on many genes, and
since at least half the genes of our genome are expressed in the brain, we can
infer that the intelligence genes segregated together with those determining
the morphological traits unique to those breeds should be different for the
most part. In other words, there would be different paths to intelligence. And
by paths I mean different clusters of gene variants that would lead to the same
result - high cognitive performance. Of course, those paths may overlap
partially, for example if three of them gave rise to a higher number of
dendrites in specific areas of the brain. So, very probably, rather than being
strictly additive, their global effect would be subject to a ceiling.
Therefore, the phase space of IQ levels with
respect to the hundreds of genes that determine intelligence would not be a
simple lineal one. Rather, there would be many islands of excellence, each one
comprising a handful of different gene variants.
If we assume these premises are reasonable, the
point of view by which high intelligence is the result of many small
incremental effects of hundreds of genes would be misguided. The frustration of
the many scientists involved in the search of "intelligence genes" at
the sight of the marginal effect of each one would be the result of considering
just individual gene variants. If we knew a priori which definite clusters of
variants are involved, their statistical analysis would show more significant
results. The problem, of course, is that we cannot have that information
ex-ante. (By the way, all this reasoning could be applied similarly to any
trait other than intelligence).
In order to illustrate this, I have made a
simple simulation in Excel (http://www.c3c.es/inteliclusters.htm)
that shows the loss of apparent correlation when considering just single
variants, compared with the correlations obtained when considering unique
groups of variants.
I have represented the "genome" of 20
individuals, each one consisting of 20 genes. Among them, in color, I show four
different clusters of genes. Each of these clusters would have itself a
significant effect on intelligence. The number of columns in white is not
relevant for this analysis; just consider them as a summary of "the rest
of the genome". In each cell there is a score (1, 2 or 3) established
randomly as the final "fitness value" of each couple of alleles of a
given gene with regard to intelligence. Just imagine 1=aa; 2=Aa; 3=AA, where A
would be the variant promoting intelligence.
In each of the four colored columns appearing
at the right, the sheet calculates the mean fitness value of the respective
cluster of variants for each individual. Then, in the column headed
"corrected sum of averages" I have applied a mathematical function
for simulating an non lineal additive effect of all the cluster averages with a
plateau; this function was the product of the four different averages raised to
the power 0.35. This value of the power factor was chosen so as to get a
credible range of values between a maximum and a minimum (shown at the end of
the column).
In the next step, this overall fitness value
was used as the input for a sigmoid function adjusted in such a way that we get
values of IQ around 100 with a standard deviation of around 15, like the real
IQ.
Finally, in the upper right corner, you can see
in color the correlation coefficients of the fitness value of each cluster vs.
the final IQ. As a comparison, you can
see also at right the correlation of the
value of any single variant of the clusters with the IQ. As expected, it is
significantly lower.
An important point is that in a real genome the
relevant clusters would probably comprise more than just 4 genes, so the loss
of significance when studying the effect of any single variant would be more
important, further hindering the discovery of any effect of that variant as a
member of a cluster influencing intelligence.
------------------------
Note: I am neither a statistician nor a geneticist, so I
could have made some mistakes. On the other hand, may be this approach is
already common for studying the genes underlying intelligence, so the whole
text would be a pathetic example of adamism. So feel free to criticize me at
@wysyq. Thank you.
<