Alan Gilanders asked me to expand on the idea of clusters in terms of
the DNA variants saga and new species.
Clustering is simply the term used to indicate that the particular
features or qualities of a number of characters always occur together as
a group, and never in combination. (Think of the set of differences
between two bird species, say New Holland Honeyeater and Brown
Honeyeater for example) It is this clustering of the physical characters
which taxonomists have always used to differentiate species. However you
can substitute DNA sequences for the sets of physical characters and do
the same sort of analysis. (Unfortunately, for each character or
position in the sequence there are only four possible types (the bases
A, C, G and T) which can complicate the analysis a bit.)
Say you sequence some DNA from many individuals of a species and find
six differences in the population. At the first difference there are two
forms and you call them A and a. The same for the next 5 differences
calling them B and b, C and c, D and d, E and e, and F and f. When you
analyse the population you may find that you always get either (A B C D
E F) or (a b c d e f). You never find any other combination such as
(AbCdEf). In this case you would have two clusters.
How does this help determine the presence of multiple species.
This is 'easy' (unless you have to do the work) if you use nuclear
genes. Consider a population of organisms for which you sequence a
number of genes spread throughout the nuclear genome. You find a number
of different combinations of differences for this set of genes. Because
of the occurrence of recombination during reproduction, all the possible
combinations of differences are likely to occur in the population IF the
population is a single species. That is, if you find most or all of the
possible combinations, it is reasonable to conclude that you have one
interbreeding population (ie one species – if you use the biological
species definition). If two species are present then you will find two
clusters of differences. There will be an entire set of combinations
which are never detected. This is good evidence (not proof) for two species.
Compare this situation to using mitochondrial DNA
Due to new mutations the mitochondrial genome in a species will
gradually change due to random new mutations and loss of mutations. But
since mitochondrial DNA doesn't undergo recombination there will be no
mixing to give different combinations of differences, even within a
Now consider a single species. If a mutation occurs in the mitochondrial
DNA of a female then all offspring of that female will have the mutation
(since mitochondrial DNA is inherited from the female parent only). Then
if a second mutation occurs in that line the second mutation will only
ever be found associated with the first mutation. Over time this will
result in a set of mutations or changes which always occur together;
that is a cluster.
If your species occured as a number of relatively isolated populations
over a period of time in the past, then the mitochondrial DNA will
change differently in the different populations. (Since mutation is a
random process) Ie the mutations will appear as two or more clusters.
This will continue even if the populations subsequently mix again to
form one large interbreeding population. This is because there is no
recombination between different mitochondrial DNAs from the two parents.
You get your mothers mitochondrial DNA and that is that. (There are also
other scenarios which could produce the same result.)
The result is that the existence of clustering in mitochondrial
differences is not really evidence of multiple species.
Addendum, The use of sequence similarity to define species.
A huge amount of evidence has shown that mutation is random but averages
out to a somewhat constant average in specific evolutionary lines. Now
no one that I know would openly declare that the degree of sequence
similarity should define a species. That is having a cutoff, where, if
the percent difference is less than the cutoff, it is the same species,
and if greater then it is a different species. However from the
occasional reference to the idea, I get the sneaking suspicion that many
molecular biologists would like to make it so. It would make life much
But the rate of mutation can be quite different between different
lineages, so a cutoff could well be quite different in different lineages.
We also have only the vaguest idea on what the important differences are
which define two closely related species as being separate. Without this
knowledge it is impossible to say how quickly a single species can split
into two; and thus how many random mutations might have occurred since
The second point is that mutation and loss of mutations from the
population appears to be more or less random. As with any random
process, one needs large numbers of instances to make a statistically
significant conclusion. In the case of the DNA Barcode idea, comparison
of 648 base pairs of DNA is not likely to provide statistically
significant data unless there is a considerable evolutionary difference
between the species (The species would be so obviously different why
bother doing it. ) If no differences were found, they might represent
the same species but there would still be a reasonable probability they
are two separate species. For the same reason, if only one or two
differences are found they could easily represent the natural variation
within a single species.
The above discussion is somewhat simplified. The complexities can make
analyses more difficult etc but don't change the fundamental ideas.
I hope that helps.
To unsubscribe from this mailing list,
send the message:
(in the body of the message, with no Subject line)