Why the correlation of individual gene variants with IQ is apparently so weak?
The starting point of the following considerations is the fact that the 10 most intelligent dog breeds are morphologically quite different from each other. Since intelligence depends on many genes, and since at least half the genes of our genome are expressed in the brain, we can infer that the intelligence genes segregated together with those determining the morphological traits unique to those breeds should be different for the most part. In other words, there would be different paths to intelligence. And by paths I mean different clusters of gene variants that would lead to the same result - high cognitive performance. Of course, those paths may overlap partially, for example if three of them gave rise to a higher number of dendrites in specific areas of the brain. So, very probably, rather than being strictly additive, their global effect would be subject to a ceiling.
Therefore, the phase space of IQ levels with respect to the hundreds of genes that determine intelligence would not be a simple lineal one. Rather, there would be many islands of excellence, each one comprising a handful of different gene variants.
If we assume these premises are reasonable, the point of view by which high intelligence is the result of many small incremental effects of hundreds of genes would be misguided. The frustration of the many scientists involved in the search of "intelligence genes" at the sight of the marginal effect of each one would be the result of considering just individual gene variants. If we knew a priori which definite clusters of variants are involved, their statistical analysis would show more significant results. The problem, of course, is that we cannot have that information ex-ante. (By the way, all this reasoning could be applied similarly to any trait other than intelligence).
In order to illustrate this, I have made a simple simulation in Excel (http://www.c3c.es/inteliclusters.htm) that shows the loss of apparent correlation when considering just single variants, compared with the correlations obtained when considering unique groups of variants.
I have represented the "genome" of 20 individuals, each one consisting of 20 genes. Among them, in color, I show four different clusters of genes. Each of these clusters would have itself a significant effect on intelligence. The number of columns in white is not relevant for this analysis; just consider them as a summary of "the rest of the genome". In each cell there is a score (1, 2 or 3) established randomly as the final "fitness value" of each couple of alleles of a given gene with regard to intelligence. Just imagine 1=aa; 2=Aa; 3=AA, where A would be the variant promoting intelligence.
In each of the four colored columns appearing at the right, the sheet calculates the mean fitness value of the respective cluster of variants for each individual. Then, in the column headed "corrected sum of averages" I have applied a mathematical function for simulating an non lineal additive effect of all the cluster averages with a plateau; this function was the product of the four different averages raised to the power 0.35. This value of the power factor was chosen so as to get a credible range of values between a maximum and a minimum (shown at the end of the column).
In the next step, this overall fitness value was used as the input for a sigmoid function adjusted in such a way that we get values of IQ around 100 with a standard deviation of around 15, like the real IQ.
Finally, in the upper right corner, you can see in color the correlation coefficients of the fitness value of each cluster vs. the final IQ. As a comparison, you can see also at right the correlation of the value of any single variant of the clusters with the IQ. As expected, it is significantly lower.
An important point is that in a real genome the relevant clusters would probably comprise more than just 4 genes, so the loss of significance when studying the effect of any single variant would be more important, further hindering the discovery of any effect of that variant as a member of a cluster influencing intelligence.
Note: I am neither a statistician nor a geneticist, so I could have made some mistakes. On the other hand, may be this approach is already common for studying the genes underlying intelligence, so the whole text would be a pathetic example of adamism. So feel free to criticize me at @wysyq. Thank you.