Your genes can increase your risk of developing a smoking habit. In a great new study, Avshalom Caspi and his colleagues show that you can use individual genomic information to predict (to some degree) who will or will not smoke. I’ll describe this finding and then ask whether medicine is ready to predict your future smoking history by reading your genome.
There have been many studies of genetic risks for smoking, including GWAS (Genome-Wide Association Study) investigations. GWAS studies are ‘hypothesis-free’, that is, they take a given phenotype (e.g., initiation of smoking) and ask which of thousands of markers for alleles are associated with that phenotype. The results of GWAS research are frequently disappointing. They typically reveal few if any previously unknown genetic predictors of the phenotype. There are at least two reasons reasons for this. First, when you examine thousands of alleles that may be associated with a phenotype, many apparent associations will occur simply by chance. Second, the effect of a single allele is almost invariably small.
Caspi and his colleagues attacked this problem in two ways. First, they sought to overcome the “chance association” problem by using only markers for alleles that have been associated with smoking in multiple previous GWAS studies. Second, they dealt with the “small effect” problem by creating a polygenic risk score that combined information about many alleles into a single score. (The score was simply the count of the number of smoking risk markers.) The researchers then looked to see if the risk score predicted the development of smoking in a longitudinal data set.
The researchers have followed a cohort of more than 1000 New Zealanders from birth to age 38. From 12 interviews with each participant over these years, they have extensive data on their smoking, including how they started and how much they have smoked over the years. They conceptualize the development of a smoking habit as having three stages: initiation, conversion to daily smoking, and development of heavy nicotine dependence. They also have extensive genetic data which they used to calculate a polygenic smoking risk score for each study participant. They then looked at whether the genetic risk score predicted the smoking trajectories of the participants.
What they found was that polygenic risk had no association with whether the participant started smoking, but participants with a higher risk score ( = more risk alleles) were more likely to become heavy smokers if they did start.
In Panel A of the Figure, the bar graph shows the distribution of the risk score (shown on the horizontal axis) in the cohort. Notice that there are very few people with very high or very low risks. This bell curve pattern is likely to characterize many genetic risk patterns (because your genetic risk score is, in effect, the result of many nearly independent coin tosses). The superimposed scatterplot and regression line shows the relationship between the risk score and the total smoking by the participant up to age 38 (shown on the right-hand vertical axis). This results demonstrate that there is a clear dose-response effect: the more genetic risk factors you have the more cigarettes you are likely to consume. However, it is only a correlation of r = 0.12, so polygenic risk explains about 1% of the variance in cigarette consumption in the cohort. Panel B shows that persons with high polygenic risk participants were moderately more likely to become nicotine dependent if they start smoking. Panel C shows that high risk participants who stopped smoking were slightly more likely to relapse.
The researchers were fair and judicious in their comments about what this means. On the one hand, they have evidence that this set of alleles affects mechanisms that lead to nicotine dependence. On the other hand, they acknowledge that the size of the association isn’t large enough to be useful for public health. In particular, I do not think it would make sense to routinely sequence the genes of adolescents to counsel them about their risk of becoming nicotine dependent. Some New Zealanders with no risk alleles smoked anyway and quite a few with all the alleles never became dependent. These gene risk factors just do not predict enough about smoking behaviour.
OK, but perhaps that is just the current state of the game in polygenic risk assessment? Could be, but I am skeptical. First, the effects of individual alleles are typically small, so finding a few more markers is unlikely to matter too much. Second, there is a diminishing returns process for the value of additional information, so you might have to quadruple the number of markers to double the predictive power. My belief is that this pattern will be typical for polygenic measures designed to predict individual behaviour. Genes will matter, just not very much.
However, it might be the case that we could combine polygenic information with phenotypic information to produce a substantially more powerful predictive score. For example, adolescents’ self-reports about whether their peers smoke help predict whether the adolescents will smoke. So a risk score that combined genetic information with data on a kid’s behavioural history might give us a powerful predictive tool.
Here again, though, the obstacles are daunting. Developing predictive tools combining multiple genetic and behavioural factors will require massive data sets. You can imagine mining this information out of very large electronic health record data sets. However, these data sets do not currently have data on smoking or other risk behaviours of sufficient quality to support instrument development; and it will be a long time before they do. We are a very long way from being able to predict individual behavioural trajectories from someone’s genes.