Jared Edward Reser Ph.D.
7/18/25
1.0 Introduction
I have dreamed of witnessing a
real-life dinosaur since childhood, but the fact remains that we may never get
our hands on original dinosaur DNA. Unfortunately, the genetic traces (even
those trapped in amber) have completely degraded tens of millions of years ago,
a major disappointment for dino aficionados and Jurassic Park fans. But as I
have been watching the advance of artificial intelligence in the last couple
years, I have realized that AI creates new possibilities. Modern machine
learning has an astounding capacity for prediction, for finding hidden
connections, and for cross referencing. Generative AI excels at constructing domain-relevant
data structures when provided sufficient context and training examples. And agentic
AI is being programmed to plan, organize, and automate complex work at scales
outside the abilities of people. Now in 2025, it is rather easy to imagine a
superintelligent AI in the future tinkering until it brings back a
guesswork-based life-sized biological recreation of a Tyrannosaurus Rex. So,
even if creating precise clones of long extinct creatures is impossible, in the
coming years, it does seem possible for advanced AI agents to create
surprisingly accurate guesswork reconstructions of dinosaur DNA.
This entry will discuss the
information available to a superintelligent AI to help it piece together the
clues necessary to do this. It will also propose a three-step pipeline using
machine learning to generate a synthetic dinosaur genome. This pipeline
leverages existing data to gather information about something we don’t have,
dinosaur DNA. Essentially, we use data from modern birds and reptiles, where we
have both the DNA and the bones, to train a system to recognize how genes shape
skeletons. Then, by giving this trained system a dinosaur skeleton, we can ask:
“What kind of genes would have built this?” In essence, it uses paired genomic
and skeletal data from extant birds and reptiles to extract latent genetic
signals and learn structure-function relationships, which we then apply to
fossilized dinosaur skeletons to infer their genomic makeup.
Remember those analogical
reasoning questions from the SAT? They would state two pairs of related things
and then ask you how they share a common relationship. Here is an example:
“Blueprint is to building as
genome is to body plan.”
All these types of questions
would share the pattern A:B::C:D. Well, the technique laid out here shares that
analogical structure. If A (bird skeleton) is to B (bird DNA), as C (dino
skeleton) is to D (dino DNA), then what does D equal? Bird genomes are related
to bird skeletons (both high dimensional vectors) by a complex analogy (a
mathematical function). The math behind that analogy is beyond human capability
to derive, but is tractable for machine learning. That is because it is less conceptual
and more computational. But, once the machine has quantified that analogy it
can apply it to dinosaur fossils to reason about dino DNA. This is because the
way a genome produces a skeleton in birds is analogous to how a genome must
have produced a skeleton in dinosaurs. The full machine learning pipeline is
depicted by the figure in Section 3 below.
2.0 The AI System that Would Be
Necessary
Before we get into specifics, let’s
use this section to discuss the generalities. Let’s describe a general multimodal
AI system that could reverse-engineer dinos by blending paleontology,
comparative genomics, structural biology, and molecular AI. To make a synthetic
approximation of a dinosaur genome, I have been envisioning an autoregressive,
attention-based, neural network AI that is based on the transformer
architecture. Such a system could be trained on genomes, especially genomes
similar to the species that the scientist is trying to resurrect. For
dinosaurs, this would be birds, crocodilians, and various other reptiles
(sauropsids). Think of it like GPT for genes. Instead of predicting the next
word, this system would predict the next nucleotide. In other words, you enter
a sequence of DNA and then it performs statistical autocomplete to make
assumptions about what could come next. There is already research, and proofs
of concept, in this vicinity.
Existing software tools that use
transformers to make predictions about human genes include DNABERT, Nucleotide
Transformer, and Enformer. Just two weeks ago Google Deep Mind released a
transformer-based genomic AI called AlphaGenome. It’s not the first but it is
currently the best glimpse into the applicability of AI for genomics. AlphaGenome is
Deep Mind’s powerful new AI model, designed to interpret long DNA sequences (up
to 1 million nucleotides (As, Cs, Ts, & Gs)) with base-pair precision. You
give it a DNA sequence, and it can predict important features like where genes
start and stop, how genes interact, and how active certain regions are in
different cells. It helps scientists better understand how our genome works,
how genetic changes might cause disease, and how we might design or edit DNA
more intelligently. There are a few reasons why the generative pretrained
transformer (GPT) architecture works well for genome prediction, as it does for
language. Genomes contain long-range dependencies that resemble linguistic
grammar and narrative structure. This allows such a system to recognize
important patterns in DNA, or any long sequence, that would be invisible to a
human.
Overall, a system like this might be
able to help make educated guesses about bird DNA if it is trained on hundreds
of bird and croc genomes to learn the relevant long-range gene dependencies,
regulatory motifs, and intron-exon patterns. The system could also be trained
to access and reference lots of other data we have on birds and crocs. Thus, the AI system would have to be
multimodal, taking information and building context from as many sources as
possible. Such a model, trained on existing (extant) animals, would provide
guardrails for producing hypothetical birds (class aves). However, to produce a
specific dinosaur, it would need as much information as possible about the
dinosaur species of interest. Such a system, if empowered by an AGI agent
(which are developing rapidly), could be prompted with dinosaur fossil features
and phylogenetic position to generate probabilistic genome drafts. The genomes
would be conditionally generated rather than inferred by genetic (or
phylogenetic) parsimony. Such a system could make edits and revisions,
iteratively manipulating a conjectural genome to get closer and closer to a
reverse-engineered dinosaur.
Just as AlphaFold revolutionized 3D
protein folding with transformers, models like the one described here might be
capable of generating functional blueprints of extinct species. Given enough
funding, computing power, and research, a system like this could theoretically produce
a genetic sequence capable of being turned into an animal. However, keep in
mind that even if we had a functional genome for a dinosaur it would still be
very difficult to grow / clone it. In fact, as we will see, this proposed project
is much more difficult than it sounds, and is likely beyond the abilities of a
human or even teams of humans. Although after reading this, you might become
convinced that a future superintelligence could tackle it.
So far, this idea is very general and
nebulous. We have a “multimodal superintelligent AI” trained on birds and
reptiles, that somehow takes context from a specific dinosaur species and uses
that to infer genetic sequences. But let’s get more specific and focus on what
information exists within dino skeletons and how it could be extracted.
3.0 Training an AI System to Predict
Dino Genomes Based on Bird Skeletons
My proposed method would start by pretraining
an AI (machine learning neural network model). The model would be taught from
examples to predict skeletal anatomy from whole-genome data from living birds
and crocodilians. Basically, researchers would give the computer a bird genome
as input and teach it to predict the body shape. Like other machine learning
systems, it would strengthen the connections responsible for correct
predictions and weaken the connections responsible for incorrect ones using
backpropagation. This would require linking the variation in genetic sequences
to measurable phenotypic differences in the bones (size, shape, proportions,
articulation geometry).
But the model connecting genes to
bones would need base knowledge before training. It wouldn’t work to start off
by letting a chicken genome predict a entire chicken skeleton. That works for
GPT because it only predicts the next word, but will never work with something
so complex as a skeleton. The system needs strong priors so we would have to
start with individual models for genes and skeletons and then connect them. The
pretrained DNA foundation model would be trained on thousands of genomes across
birds and reptiles to help it compress genomes into a meaningful embedding. Then
a separate model would have to be pretrained on 3D skeletal shapes. The
skeletal AI system would look at things like how limb bones scale across
species, what structural features co-occur (femur angle with pelvic width), and
general anatomical logic. This could involve a pretrained shape encoder that is
trained on thousands of CT scans (or mesh landmarks) from birds and reptiles.
Then these two models (genes and bones) would be aligned and trained as a joint
model (e.g. contrastive learning, variational inference, diffusion bridge) to
associate genome embeddings with morphology embeddings. This would be like
teaching someone who already speaks fluent “genome-talk” and fluent
“anatomy-talk” to become a translator between them instead of having to learn
both languages from scratch at the same time. Merging the models could be done
with relatively modest supervised training data, because the hard work of
representation would have already taken place in pretraining.
This project would necessitate the
genomes and naked skeletons of many reptiles and birds to build a library of
examples (genotype to phenotype training corpus). This may necessitate genome
and skeleton data from thousands of extant birds and crocodilians. There are
over 10,000 species of reptiles and birds each, and this is great, because the
more data the better. As of February 2022, genomes have been completed for over
540 species of birds, including at least one from every bird order. It is
highly possible that other vertebrate (chordate) data, like fish, amphibian,
and mammalian data, could strengthen the model. Either way, many animal’s
genomes would have to be sequenced because numerous training examples would be
needed. Distantly related species should be emphasized so the system can
capture the diversity of forms. However, the system would also benefit from
being exposed to intraspecies diversity, meaning that it would help to have
numerous samples from the same species. The more, the better.
The genome is already a linear sequence
of nucleotides ready for machine learning. The skeletons are 3D objects so they
would have to be 3D mapped in a computer and broken down quantitatively into
strings of numbers so that the AI can ingest them. This is addressed by the field
of quantitative morphometrics and there are already large existing datasets. The
morphological traits would be encoded as embeddings in a multidimensional
embedding space where similar traits are located near each other.
Then, we would teach the machine to use
the genome to predict what kind of skeleton it would make. A deep neural
network or transformer-like architecture would learn statistical the mappings
from sequence space to shape space. To do this, the AI model would be exposed
to many pairs of genomes and skeletons so it can learn the mappings between
them. It may sound strange to go from DNA to a skeleton but this is what these
autoregressive systems do, they learn to predict one sequence from another. Contemporary
systems have billions of parameters to tweak in order to memorize the intricate
relationships between the input sequences and their corresponding output
sequences. This forward model would have to be validated with data not used in
training. For example, we could hold back the data from chickens in order to
see how close to an actual chicken skeleton the system can get when given a
chicken genome. If the system performs well, we would take this forward model
and run it backwards.
Once the forward model is trained, we
would invert it, applying it in reverse to predict plausible genomic sequences
from input skeletal morphology. This is shown in the second step seen in the
figure below. We would feed the inverted model skeletal data and train it to
output genomes. This bi-directional genotype-phenotype modeling is similar to
recent breakthroughs in protein design (AlphaFold → inverse design models like
Chroma, RFdiffusion) where forward prediction unlocks plausible generative
capacity in the inverse direction.
Once the model is adequately trained
to turn skeletal data from living vertebrates into genetic sequences it could
be fed morphometric data from a fossil dinosaur skeleton. This would
necessitate a well-preserved and 3D mapped fossil dinosaur skeleton. After
feeding this data into the inverse model, it would output a distribution of
plausible genomic sequences consistent with bird-reptile sequence-morphology
mappings (sampling from a probabilistic output space). If a bird’s genome
explains its skeleton, then a dinosaur’s skeleton must have hidden information
that can hint at its genome. You can see why this would not generate the true dinosaur
genome. But it would generate a “best-fit plausible genome” grounded in
comparative genomics and constrained by actual fossilized dino anatomy. This
idea resembles recent AI breakthroughs in structure-function inference in
proteins, but here is applied to deep-time vertebrate paleogenomics.
It is worth mentioning that it
wouldn’t make sense to start by predicting bird genomes from bird skeletons, it
must be done the other way around. The direction of causality plays a role,
because genes cause bodies. Also, many different genotypes could result in
similar skeletons and thus predicting genomes from skeletons requires learning
a much more complex probability distribution. That is why starting by
predicting skeletons from genomes creates a framework of internalized
generative biological relationships which can be used to constrain guesses when
running it backwards on fossils. It is like learning how to bake bread, you
must first learn how ingredients combine to make bread before you can try to
work backwards and guess what ingredients went into a loaf by looking at its
shape and texture. This method may sound outlandish at first,
but it’s actually exactly what these models do and are being trained to get
better at: to exhaustively search for and learn all of the predictive patterns
that can be found between an input sequence its output sequence.
By training a model on species where
both the genome and skeleton are known, it becomes possible to map out a shared
space where genetic traits and anatomical features co-vary in predictable ways.
It also unearths latent traits, which are hidden patterns that connect DNA and
anatomy. While it will not recover full genomes, it will generate constrained,
probabilistic profiles, possibly offering the most scientifically grounded
glimpse yet into the genomic architecture of extinct species. Of course, adding
more information, aside from just skeletal anatomy, will help us reconstruct
genomes with more precision. Let’s discuss this next.
4.0 What Other Information Might
Assist In This Reconstruction?
We can glean a lot of pertinent
information from fossils. Bone morphology and histology gives us size, shape,
posture, musculature, joint angles, growth rate, vascularization, and stress
markers. The bones also give us lots of internal geometry that relates to
internal organs. Comparing fossils of animals of different ages provides
information about the way the bones change over lifespan as well as growth
curves and the hormones and chemicals that might underlie them. This all
informs which developmental genes (e.g., FGF, BMP, Runx2) likely governed these
features. Bones alone could reveal significant genotype-to-phenotype mappings.
But paleontologists have so much more on
dinosaurs than just bones. There are also fossils of soft tissue impressions
that show skin, feathers, and scales, providing copious information that would
be necessary for approximating dino features. These impressions offer a wealth
of information about the shape and composition of soft tissues that would have sat
on top of the bones. Given enough of this information about the 3D structure of
the exterior of the body, it could also be analyzed using the three-step
pipeline above. In that case, it would be compared to the exterior of birds and
reptiles. Just look at this highly preserved (practically mummified) impression
of an ankylosaurus. There are also some findings of dinosaur internal organs
and blood vessels. Data like this provides a wealth of anatomical information
that could be utilized.
There are many other forms of fossils
that could lend meaningful data. Eggs are common and their size, shape, and
composition provide clues. Trace fossils like tracks and footprints add to the
detail and give us information about the feet, the gait, and biomechanics that
could be compared to that of birds. Brain endocasts (bony brain cases) are
often uncovered in fossil specimens and they allow scientists to see the shape
and size of the species’ brain. They also help paleoneurologists compare the
proportional size of their neuroanatomical structures to the brains of crocodilians
and birds. Even gastroliths (stomach stones) and coprolites (poop) offer
mathematical and geometrical details. Scientists have also uncovered
information regarding dinosaur melanocytes and pigmentation that offer strong
clues about the colors of skin, scales, and feathers. Given that artist’s
representations of dinosaurs (paleoart) has been improving in scientific rigor
for over a century, an AI system could also utilize pictorial and video CGI
reconstructions of dinosaurs as references.
Information about dinosaur behavior
would also help and thankfully scientists have been developing this for over a
century. Careful study has revealed very specific inferences about nesting,
brooding, pack hunting, mating, sociality, intelligence, and much more. Fossil
burrows, resting traces, feeding traces, and gnaw marks add to the resolution.
There has also been lots of exacting scientific work on ontogenetic
development, pathology, thermoregulatory physiology, isotopic signatures,
paleobotany, and paleoenvironmental reconstruction. All this information helps
us make assumptions about behavior, and comparing it to bird behavior allows us
to constrain neurological and endocrine gene candidates. All taken
together, we have an extraordinary amount of collateral
evidence that can guide probabilistic reconstruction. The multimodal AI
would not just “guess blindly” but would condition its genome drafts on a
broad constellation of well-studied physical, ecological, physiological, and
behavioral constraints.
Most of the present research
on resurrecting dinosaurs involves using phylogenetic methods to look for
conserved and divergent sequences from birds to reconstruct a plausible common
ancestor. Partially reconstructing the common ancestor of all birds,
which researchers are working on now, would be helpful but it wouldn’t never carry
us to extinct dinos. Currently, scientists can and
do make crude mathematical predictions about ancient DNA from living species. Scientists
have also been able to predict some physical human traits like face or bone
shapes from genes. However, I think the present method could take all this a
lot farther. It seems that no one has proposed a system that could learn to
predict skeleton shape based on DNA from modern birds and crocodiles, and then
use that knowledge in reverse to predict dinosaur DNA from fossil skeletons. I
actually asked GPT, Gemini, Claude, and Grok to search the web and see if
anything like what I am laying out here has already been proposed and they each
reported, after many combined minutes of search, that there is nothing like
this on the internet and that it may be the best starting framework.
5.0 Birds and Reptiles Provide a
Wealth of Genetic and Anatomical Data
Scientists use techniques such as comparative genomics and the molecular clock technique (among others) to map relatedness in vertebrates and this gives our AI a huge family tree to reference. Remember that birds are technically dinosaurs, so we do have dino DNA and this could go a long way toward predicting dinosaur genomes, especially certain kinds. Tyrannosaurus rex and velociraptors, like all birds, were members of theropoda, a group of bipedal, mostly carnivorous dinosaurs. In fact, birds and all theropods even belong to an even more specialized branch called Coelurosauria, which includes feathered dinosaurs and some of the most cognitively advanced species of the Mesozoic. Because birds are themselves coelurosaurian theropods, extinct lineages within Coelurosauria would be the most manageable for a bird-based proxy approach to resurrection. Of course dromaeosaurs like the raptors, as well as other paravian species (troodontids) would be closer (and easier to piece together) than a T. rex. Kind of cool for us because the dromaeosaurs like velociraptor and Utah raptor were likely the most intelligent and agile of all dinosaurs.
The last common ancestor of all birds lived around 100 million
years ago in the late Cretaceous. Scientists have employed sophisticated bioinformatics to trace back how bird genomes evolved through deep time, attempting to piece together this genome. But the common ancestor that birds shared with the T. rex would have
lived 160 million years ago in the Jurassic. This is an additional 60 million years so unfortunately, that leaves lots of time for
genetic changes. Another helpful landmark, the bird-crocodilian split, happened around 250 million years ago in the Triassic. This was the common archosaur ancestor and scientists have used advanced comparative genomics techniques to reconstruct an estimated 50% of its genome at 91% accuracy.
But
it is not just birds that can help in this regard. Birds and crocodilians are
living archosaurs, a taxonomic group (clade) that dinosaurs are also nested
inside. That means that crocs (alligators, caimans, gharials, and crocodiles)
have much to offer as well. They offer an excellent contrast with the birds to
help us interpolate about dinosaurs. Genomic comparison of bird vs. croc
genomes identifies both deeply conserved elements and divergent innovations.
Using genomes from birds, crocs, and other reptiles would help us reconstruct
the gene order on chromosomes, predict coding genes present in dinosaurs, and
estimate regulatory architecture. It definitely helps that we have access to
thousands of living archosaur and even coelurosaur genomes to study
and use as references.
Data
from lizards, snakes, and turtles will also contribute. Even reptiles like the
tuatara offer further comparative opportunities given their genetic distance
from other reptiles, their slow-evolving genome, and ancient reptilian traits.
In fact, there are many interesting species, including monotremes (egg laying mammals),
that could help constrain the baseline reptilian architecture upon which
dinosaur genomic traits evolved.
There
are also some fascinating birds that lend comparative details. Large flightless
birds like the cassowary are about as close as modern birds get to true dinos
(their feet look just like the feet of the dinosaurs in Jurassic Park). Two
species of giant moas (Dinornis robustus and Dinornis novaezealandiae)
might provide some profound insights. These towering birds, the only birds without even a vestige of a wing, are currently
extinct although they were hunted as recently as 500 years ago. Over the last
decade ancient-DNA labs have recovered both mitochondrial and sizable nuclear
genomes from several moa species, including the two “giant moa” one of which was taller than 11 feet and over 600 pounds. Because moa
evolved the largest body masses ever achieved by birds (up to 250 kg)
independently of today’s ratites (ostriches, emus, cassowaries), their genomes
are a natural experiment in avian gigantism. They give us a statistically
powerful way to see which genetic routes birds can and cannot take when they
achieve very large sizes. Moas would supply our dinosaur-inference pipeline
with needed genotype-phenotype pairs at the extreme end of body size.
Some scientists have been attempting to take living birds
and change specific genetic features to create a throwback look. Jack Horner
(the inspiration for Dr. Grant in the Jurassic Park series and technical
advisor on the films) has a project to create a “chicken-o-saurus.” This project
is aimed at creating a modified chicken that expresses dormant dinosaur-like
traits. Dr. Horner wants to use gene-editing tools like CRISPR to counter the
recent genes that made birds less like dinos. He envisions a chicken with
teeth, a long tail, arms with clawed hands, and a rounded snout rather than a
beak. I imagine we would want to remove feathers, the keel on the sternum, and
the pygostyle (fused tail vertebrae). Searching for dormant genes in birds, in
this way, could be a valid technique. Scientists inspired by Horner’s ideas
went on to make a beak-less chicken with a snout that looks very dinosaur-like.
To accomplish this, they found a cluster of genes related to facial development
that exists in birds, but no other animals. They used an inhibitor to suppress these
genes in embryonic chickens and, as you can see, the resulting bird faces
appear much more like their distant dinosaur ancestors.
There are several other de-extinction projects underway
now, but they all involve animals whose entire genome has been recovered from
their remains. Furthermore, all these projects are not really true clones or
resurrections. They all involve taking a related animal and changing a few
genes (similar to the chicken-o-saurus concept). For example, Colossal Biosciences
claim to be bringing animals such as the woolly mammoth, the thylacine, and
dire wolves back from extinction. But in reality, they are taking the extinct
genome as reference and then editing genetic sites in the closest living
relatives to make a proxy animal with key traits. To “create” a woolly mammoth
they are giving Asian elephants “cold resistant” attributes such as fur,
increased fat, altered hemoglobin, and smaller ears. This is achieved by
multiplex gene editing. To create the “dire wolves” they edited 20 sites across
14 genes in gray-wolf DNA using sequences inferred from ancient dire wolf
remains and then cloned the edited cells Why don’t these companies just build
an animal around the recovered genome? As the next section will explain, that
is just too far beyond today’s technology.
6.0 Dinosaur Embryology
If we had a full, viable, synthetic
Tyrannosaurus rex genome on a computer, bringing it to life would involve a
complex series of technological steps. First it would have to go from zeros and
ones on a computer, to the actual DNA polymer. The genetic code would have to
be synthesized in segments using methods like Gibson assembly or yeast-based
artificial chromosome construction (YAC). Currently something like this has
been done for microbes but whole-genome synthesis has currently not been
achieved for animals. The synthetic genome would be inserted into a host cell,
like a de-nucleated bird ovum. This could be done via somatic cell nuclear
transfer (SCNT) similar to the method used for Dolly the sheep. It is worth
mentioning that even though several mammals have been cloned, scientist are
still unable to clone a bird. The machinery of the cell that is hosting our T. rex
genome must recognize it and properly express it proteins, helping it build a
body. There must be no mismatches or incompatibilities with the cellular
(cytoplasmic) environment or with host mitochondrial DNA. This ovum would then
need to be implanted within a surrogate egg (even an ostrich egg is three to four
times smaller than a T rex egg) or artificial womb. Incubation conditions would
have to align precisely with the embryological development. Post-hatching the
dinosaur would require intensive care, proper diet, temperature, humidity, and
parental, brood, and social interaction. You can see how difficult this would
be. Our technology is not there and not even close right now, but of course AI
may change this rather rapidly. But let’s keep in mind that having vast
information about dinosaurs genomes has value outside of de-extinction such as deepening
our understanding of dinosaur biology and evolution.
7.0 What Other Animals Could Be
Modeled Using this Framework?
Dino genes must be inferred because no
truly intact, sequence-quality dinosaur DNA has ever been recovered and likely never
will. The only genetic traces extracted from dino fossils are highly degraded
molecules (possible chromosome fragments, collagen sequences, and chemical DNA
markers) found inside exceptionally preserved dinosaur cartilage or bone. Even
these findings are controversial and nowhere near the quality needed for genome
sequencing or “de-extinction.” Experiments on ancient bones show DNA’s average
bond half-life is about 521 years at 13 °C. Given this rate, statistical decay
predicts all links would be destroyed after around 6.8 million years, even in
perfect conditions. Unfortunately, non-avian dinosaurs went extinct 66 million
years ago, ten times beyond that limit. In fact, retrieving an entire genome
from the fossil remains of any species becomes very difficult after 100,000
years. After one million years, even if preserved by very cold or dry
conditions, any DNA that is recovered will be fragmentary.
The fact that DNA can easily survive
10,000 years means that dodos, thylacines (marsupial tigers), woolly mammoths,
and saber-toothed cats would not need the technique I am introducing here to be
cloned and resurrected. But there are many interesting species aside from
dinosaurs to which this technique would need to be applied. These include
ancient animals such as trilobites, euryptids (sea scorpions), giant
dragonflies, and ammonites; Mesozoic marine reptiles such as plesiosaurs,
mosasaurs, and ichthyosaurs; Pleistocene megafauna such as woolly rhinoceros,
cave lions, and giant ground sloths; early mammal-like reptiles (synapsids)
such as pelycosaurs, and cynodonts, as well as recent human ancestors such as
australopithecines, homo erectus, and homo heidelbergensis. It should even work
on plants and fungi because we have many fossils of ancient plants. It is
unclear if a technique like this could stretch back 500 million years ago to
Cambrian animals like haikouichthys, anomalocaris, and hallucigenia in the absence
of close modern relatives.
It is worth mentioning that a process
like the three-tiered pipeline described above could be used to approximate the
faces of our hominin ancestors. Neanderthal, Denisovan, Homo Floresiensis and
other skulls could be entered into an AI model after the model has been trained
on human and ape skull / face pairings. This could be combined with other
techniques to predict the facial features of ancient humans, another sight I
have long thought lost to time before the advent of modern AI.
8.0 Weaknesses of This Approach
The method I have introduced here will
not unearth the actual genomes, just make incredibly informed guesses about it.
However, even an advanced AI system will not be able to reproduce sequences
where evolution introduced significant novelties, lineage-specific adaptations,
or regulatory rewiring that has no modern parallel. All of those actual genetic
mutations and adaptations that dinosaurs made are lost to time. Furthermore,
such a synthetic genome could result in a visually compelling likeness and an
uncanny simulacrum of the artistic renderings but may largely fail to reproduce
internal regulation. The process could result in animals that look like the
dinosaurs in the movies but whose physiology and even behavior is closer to
birds or crocs. This would risk creating a "chimeric reconstruction"
rather than a resurrection.
Chromosome number confuse things. Some
birds have 40 chromosomes, other have over 140 and it is anyone’s guess how
many T rex had. Regulatory sequences (e.g., promoters, enhancers, and
silencers) control when, where, and how much genes are expressed. They are
often species-specific and evolve rapidly. A T. rex might have had unique
enhancers for muscle growth or bone density that no longer exist in its living
relatives. Epigenetic modifications (DNA methylation and histone modification)
also influence gene activity but without altering the DNA sequence. These marks
decompose with the genes so AI would have to hypothesize working epigenetic
profiles based on modern analogs, furthering uncertainty. Non-coding DNA (that
does not code for proteins, e.g., introns, regulatory elements, and supposed
junk DNA) also poses an issue. It comprises 98-99% of the genome, evolves
differently from coding regions, often contains lineage-specific adaptations,
and lacks clear genotype-phenotype correlations. The present skeletal
morphology technique could capture coding genes linked to bone structure, but
non-coding DNA’s role in overall genome stability, gene regulation, and
phenotypic variation would remain unaddressed.
It is also important to mention that
there are serious ethical and ecological concerns at play here outside the
scope of this entry. For instance, humans artificially selected pug dogs to
have a collapsed snout, this made it difficult for them to breathe. There are
many examples of domestication creating disease states and it is clear that an
engineered dinosaur could be born into an uncomfortable, painful, or diseased
body. Hollywood has already pointed out many of the ethical quandaries of de-extinction
including animal cruelty, human safety, and invasive ecological concerns.
Currently, you cannot fit an entire
vertebrate genome into the attentional window of an AI. This means that it
cannot take the entire genome into account when making predictions and that
some long-range dependencies may not be recognized. Bird genomes are around 1
billion base pairs (1.0 to 1.3 gigabase pairs (Gb)) and crocodile genomes
contain around 2 to 3 billion. Chat GPT can only hold about 128,000 tokens at a
time and Google Gemini can hold around around one million. That means that the
attentional window needs to be over 1,000 times bigger. The industry has seen
attention doubling every 18 to 24 months and at this rate it would be around 10
years before a transformer’s window of attention can encompass all of the DNA
in question. Of course, there are many ways to get around this, even today, (preprocess
the genome into embeddings, prioritize known relevant loci, and use
hierarchical architectures) but this is just one example of the fact that as
technology grows this idea becomes more feasible.
The jump from predicting skeletal
morphology to generating functional genomes capable of producing viable
organisms is a significant one. The mappings are highly nonlinear, and
dependent on environmental context. Moreover, a dinosaur skeleton could produce
multiple plausible genomes, there is no way to actually test to see if any are
accurate, and validating which could be biologically viable could be very
difficult. One of the most sobering hard truths about this enterprise is the difficulties
inherent in embryology. To progress from a zygote, to an embryo, to a fetus, to
a healthy young animal the genetic blueprint must be incredibly internally
consistent. This is easy for nature to accomplish, but just having an AI dream
up (or worse hallucinate) an animal genome gives little reassurance that there
will not be structural inconsistencies due to the tremendous complexity of gene
interactions. Everything must work together, and work just right to avoid
developmental failure. Of course, this is a problem that a far-future
superintelligence could solve, but it won’t be solved using the method outlined
in my three-step pipeline outlined above.
But this pipeline may be more useful than it may at first seem. The
three-step methodology outlined here is not just suited for biology. It could
be used as a generalizable framework for cross-domain translation: where one
set of observable features (like morphology) is used to infer a related, but
unobservable set (like genetics), via a latent, learned manifold. In fact, this
method (train forward, invert, apply to unknowns) is a flexible blueprint for
abductive reasoning via deep representation learning, and it could have
enormous potential beyond biology. You would use it in many cases when you have
A, B, and C, but not D. And A is to B as C is to D. We could call it bidirectional
manifold mapping for latent inference. It learns to model a latent manifold
that encodes the “grammar” of a domain. Once that space is shaped well enough,
inverting across it becomes a powerful general inference engine.
9.0 Conclusion
One of the fondest memories I have is
of reading Michael Crichton’s Jurassic Park in third grade, before the first
movie came out, and marveling at the idea of extracting dino DNA from a
mosquito trapped in amber. Today, we know it is impossible, but it sure felt
elegant at the time. Even without DNA recovery, AI promises a new kind of
“virtual paleogenetics,” a way to infer and simulate the genomes and
physiologies of extinct organisms using bioinformatics, sophisticated
prediction and comparative analysis. This is due in part to another scientific concept that enraptured me as a child (and the original Jurassic Park novel got right), that birds are dinosaurs.
Despite the major hurdles, the method
outlined here could be a good starting point for several reasons. Creating the
first genome-to-skeleton bidirectional models would also produce incremental
value at each step, advancing the understanding of genotype-phenotype
interactions. Even without resurrection, these AI-generated dino genomes could
allow synthetic cell lines for studying dinosaur biochemistry in vitro,
organoid models, or simulations of growth curves, muscle structure, and
thermoregulation.
The present method would only produce
a synthetic “consensus dinosaur genome,” not a true historical sequence. It
would be educated guesswork, but firmly grounded in real data. It could even
result in living and breathing organisms that look how we expect dinosaurs to
look, although they may not function or act like the real thing. I believe that
creating a Jurassic Park with cloned approximations will be possible when
methods like those discussed here are in the hands of superintelligent AI agents.
In fact, I wouldn’t be surprised if this fantasy was brought to life within our
lifetimes.
Note: I started writing this blog after having a strong sense that artificial superintelligence could achieve dinosaur resurrection and I wanted to provide a description of that vision. But after posting it, I knew something was missing. So I sat on the couch for 10 minutes and racked my brain, telling myself over and over that there was something important I was missing. And then somehow the idea for the three step pipeline basically entered my mind fully formed. It felt like the idea just materialized, possibly from unconscious incubation, because there was next to zero reasoning involved .It’s not going to be easy to implement, but I do think it could reach and harness key latent information hidden in dinosaur fossils.
No comments:
Post a Comment