More than 20 years ago, the human genome was completed (Lander et al., 2001) by the Genome Reference Consortium (GRC). This assembly is known as the GRCh38 reference sequence. Although it is based on DNA from an anonymous group of donors, ⅔ of its sequence is derived from one single male donor of African-European ancestry (Genome Reference Consortium, 2023).*
Complete meant 92% complete since there are parts of the human genome that were very difficult to sequence with the technology of the last millennium, notably repetitive DNA. Such sequences are especially abundant in specific regions of chromosomes, such as the centromere (Altemose et al., 2022). But even the latest GRCh38 assembly, released in 2017, still contained 151 Mbp of unknown sequences, including pericentromeric and subtelomeric regions, recent segmental duplications, amplified gene arrays, and ribosomal DNA arrays. New technology had to be invented to obtain the gapless sequence more than 20 years later (Nurk et al., 2022).
Humans are animals with a double chromosome set in their somatic cells (diplonts). Except for the Y chromosome, humans normally have, therefore, two copies of each gene. When these two copies differ, we are heterozygous with respect to this gene. If they are identical, we are homozygous. These different versions of a gene are called alleles. When isolating DNA from an individual, the information from the different chromosomes gets mixed up (squashed). In addition to having a gapless genome, we also want a haplotype-resolved (phased) genome, which we got about two years ago (Ebert et al., 2021).
Last week, another big step forward in the quest to complete the human genome was taken: The first draft of the human pan-genome was published. This effort by the Human Pangenome Reference Consortium acknowledges the vast differences among humans. Instead of featuring one reference genome, the pan-genome contains “47 phased [haplotype-resolved], diploid assemblies from a cohort of genetically diverse individuals” (Liao et al., 2023). This is not the last word in the quest for the complete human genome: it’s a continuous effort with the next targets on the horizon.
*The publicly funded Human Genome Project (HGO) had a commercial competitor spearheaded by Craig Venter and his company Celera. The publicly and privately funded project crossed the finish line around the same time (Venter et al., 2001). However, the Celera assembly has not been receiving nearly as much attention and improvement over the years compared to the HGO, simply because access to the data has been restricted in one way or the other (the above-cited publication in Science is still not free available). Needless to say, the Celera assembly was a genome of an old white male Caucasian: Craig Venter himself (Singer, 2007).
If you want to sequence your own genome, where can you get that done? A few companies offer Direct-to-Consumer Whole Genome Sequencing services. Even the cheapest service will set you back $300, but given that the sequencing of the first human genome required more than one billion dollars, it's a bargain.
Without any own experience, I cannot vouch for any of them, but here are those that I found legitimate. Nebula, with its 100x coverage offer, seems to be the most useful to me. However, Sano Genetics promises free testing if you fulfil certain criteria and contribute to their research. In any case, I would make sure that I get not only the interpreted results but also the raw data.
GeneLife (in Japan, https://www.genelife.jp)