Jared Edward Reser Ph.D.
7/18/25
Citation for this post:
Reser, J.E. (2025).
AI-Mediated Reconstruction of Dinosaur DNA from Fossil Morphology and Extant
Genomes (1.0). AI Thought. https://doi.org/10.5281/zenodo.17604519
1.0 Introduction
I have dreamed of witnessing a
real-life dinosaur since childhood, but the fact remains that we may never get
our hands on original dinosaur DNA. Unfortunately, the genetic traces (even
those trapped in amber) have completely degraded tens of millions of years ago,
a major disappointment for dino aficionados and Jurassic Park fans. But as I
have been watching the advance of artificial intelligence in the last couple
years, I have realized that AI creates new possibilities. Modern machine
learning has an astounding capacity for prediction, for finding hidden
connections, and for cross referencing. Generative AI excels at constructing domain-relevant
data structures when provided sufficient context and training examples. And agentic
AI is being programmed to plan, organize, and automate complex work at scales
outside the abilities of people. Now in 2025, it is rather easy to imagine a
superintelligent AI in the future tinkering until it brings back a
guesswork-based life-sized biological recreation of a Tyrannosaurus Rex. So,
even if creating precise clones of long extinct creatures is impossible, in the
coming years, it does seem possible for advanced AI agents to create
surprisingly accurate guesswork reconstructions of dinosaur DNA.
This entry will discuss the
information available to a superintelligent AI to help it piece together the
clues necessary to do this. It will also propose a three-step pipeline using
machine learning to generate a synthetic dinosaur genome. This pipeline
leverages existing data to gather information about something we don’t have,
dinosaur DNA. Essentially, we use data from modern birds and reptiles, where we
have both the DNA and the bones, to train a system to recognize how genes shape
skeletons. Then, by giving this trained system a dinosaur skeleton, we can ask:
“What kind of genes would have built this?” In essence, it uses paired genomic
and skeletal data from extant birds and reptiles to extract latent genetic
signals and learn structure-function relationships, which we then apply to
fossilized dinosaur skeletons to infer their genomic makeup.
Allow me to offer an analogy... involving analogies. Remember those analogical
reasoning questions from the SAT? They would state two pairs of related things
and then ask you how they share a common relationship. Here is an example:
“Blueprint is to building as
genome is to body plan.”
All these types of questions
would share the pattern A:B::C:D. Well, the technique laid out here shares that
analogical structure. If A (bird skeleton) is to B (bird DNA), as C (dino
skeleton) is to D (dino DNA), then what does D equal? Bird genomes are related
to bird skeletons (both high dimensional vectors) by a complex analogy (a
mathematical function). The math behind that analogy is beyond human capability
to derive, but is tractable for machine learning. That is because it is less conceptual
and more computational. But, once the machine has quantified that analogy it
can apply it to dinosaur fossils to reason about dino DNA. This is because the
way a genome produces a skeleton in birds is analogous to how a genome must
have produced a skeleton in dinosaurs.
2.0 The AI System that Would Be
Necessary
Before we get into specifics, let’s
use this section to discuss the generalities. Let’s describe a general multimodal
AI system that could reverse-engineer dinos by blending paleontology,
comparative genomics, structural biology, and molecular AI. To assemble a synthetic
approximation of a dinosaur genome, I have been envisioning an autoregressive,
attention-based, neural network AI that is based on the transformer
architecture. Such a system could be trained on genomes, especially genomes
similar to the species that the scientist is trying to resurrect. For
dinosaurs, this would be birds and crocodilians (archosaurs), and various other reptiles
(sauropsids). Think of it like GPT for genes. Instead of predicting the next
word, this system would predict the next nucleotide (or codon/amino acid). In other words, you enter
a sequence of DNA and then it performs statistical autocomplete to make
assumptions about what could come next. There is already research, and proofs
of concept, in this vicinity.
Existing software tools that use
transformers to make predictions about human genes include DNABERT, Nucleotide
Transformer, and Enformer. Just two weeks ago Google Deep Mind released a
transformer-based genomic AI called AlphaGenome. It’s not the first but it is
currently the best glimpse into the applicability of AI for genomics. AlphaGenome is
Deep Mind’s powerful new AI model, designed to interpret long DNA sequences (up
to 1 million nucleotides (As, Cs, Ts, & Gs)) with base-pair precision. You
give it a DNA sequence, and it can predict important features like where genes
start and stop, how genes interact, and how active certain regions are in
different cells. It helps scientists better understand how our genome works,
how genetic changes might cause disease, and how we might design or edit DNA
more intelligently. There are a few reasons why the generative pretrained
transformer (GPT) architecture works well for genome prediction, as it does for
language. Genomes contain long-range dependencies that resemble linguistic
grammar and narrative structure. This allows such a system to recognize
important patterns in DNA and understand their repercussions for later sequences.
Overall, a system like this, if trained on hundreds or reptile genomes to learn the relevant long-range dependencies regulatory motifs, and intron-exon patterns, might be
able to help make educated guesses about the DNA of any reptile (and remember, dinosaurs are reptiles). An AI system, that also uses language and thinking, could be trained
to access and reference lots of other herpetological data. Thus, the AI would have to be
multimodal, taking information and building context from as many sources as
possible. Such a model, trained on existing (extant) animals, would provide
guardrails for producing hypothetical reptiles. However, to produce a
specific dinosaur, it would need as much information as possible about the
dinosaur species of interest. Luckily, we know a lot about a lot of dinosaurs. Such a system, if empowered by agentic AGI (which is developing rapidly) could take this on as a long time horizon project. Not just a chatbot, it would plan on its own, breaking the assignment down into subtasks and creating workflows. Prompting this system with a fossil from a specific dinosaur species and its phylogenetic position, it would research how to best generate probabilistic genome drafts. These genome drafts would be conditionally generated rather than inferred by genetic (or
phylogenetic) parsimony. Such a system could make edits and revisions,
iteratively manipulating a conjectural genome to get closer and closer to a
reverse-engineered dinosaur.
Just as AlphaFold mastered complex 3D
protein folding with transformers, models like the one described here might be
capable of generating functional blueprints of extinct species. Given enough
funding, computing power, and research, a system like this could theoretically produce
a genetic sequence capable of being turned into an animal. However, keep in
mind that even if we had a functional genome for a dinosaur it would still be
very difficult to grow / clone it. In fact, as we will see, this proposed project
is much more difficult than it sounds, and is likely beyond the abilities of a
human or even teams of humans. After reading this though, you might become
convinced that a future superintelligence could tackle it.
So far, this idea is very general and
nebulous. We have a “multimodal superintelligent AI” trained on birds, crocodiles, turtles, snakes, and lizards, that somehow takes context from a specific dinosaur species and uses
that to infer genetic sequences. But let’s get more specific and focus on what
information exists within dinosaur skeletons and how it could be extracted.
3.0 Training an AI System to Predict
Dino Genomes Based on Bird Skeletons
My proposed method would start by pretraining two linked neural network models. One model pretrained to work with skeletons, and the other to work with genomes. These two models would then work together, learning from
examples to predict skeletal anatomy from whole-genome data coming from living reptiles. Basically, researchers would give the computer a bird genome
as input and teach it to predict the outlines of the skeleton. Like other machine learning
systems, it would strengthen the connections responsible for correct
predictions and weaken the connections responsible for incorrect ones using
backpropagation. It would know how to provide supervised feedback because it has the actual "ground truth" genomes and skeletons taken from the living animals. This learning would require linking the variation in genetic sequences
to measurable phenotypic differences in the bones (size, shape, proportions,
articulation geometry).
But the models connecting genes to bones would need base knowledge before training. It wouldn’t work to start off by letting a chicken genome predict a entire chicken skeleton from scratch. That works for language models because they only predict the next word, but it would never work with something as complex as a skeleton. The system needs strong priors so we would have to start with individual pretrained models for genes and skeletons and then connect them. It is not clear if the pretrained DNA foundation model should be trained on every genome we have, or just animals, or just vertebrates. But it would need to be exposed to thousands of genomes so it knows how to compress them into meaningful embeddings.
Then
a separate model would have to be pretrained on 3D skeletal shapes. The
skeletal AI system would look at things like how limb bones scale across
species, what structural features co-occur (femur angle with pelvic width), and
general anatomical logic. This could involve a pretrained shape encoder that is
trained on thousands of CT scans (or mesh landmarks) from birds and reptiles.
Then these two models (genes and bones) would be aligned and trained as a joint
model (e.g. contrastive learning, variational inference, diffusion bridge) to
associate genome embeddings with morphology embeddings. This approach would be like
teaching someone who already speaks fluent “genome-talk” and fluent
“anatomy-talk” to become a translator between them instead of having to learn
both languages from scratch at the same time. Merging the models could be done
with relatively modest supervised training data, because the hard work of
representation would have already taken place in pretraining.
This project would necessitate the genomes and naked skeletons of many reptiles and birds to build a library of examples (genotype to phenotype training corpus). This may necessitate genome and skeleton data from thousands of extant reptiles. There are over 10,000 species of reptiles and birds each, and this is great, because the more data the better. As of February 2022, genomes have been completed for over 540 species of birds, including at least one from every bird order. This is important because distantly related species will help the system can capture the diversity of forms. However, the system would also benefit from being exposed to intraspecies diversity, meaning that it should help to have numerous samples from the same species. The more, the better. It is highly possible that other vertebrate (chordate) data, like fish, amphibian, and mammalian data, could strengthen the model's ability to work with dinosaurs. Either way, many animal’s genomes would have to be sequenced because numerous training examples would be needed.
How would we prepare the data for the computer? The genome is already a linear sequence
of nucleotides ready for machine learning (e.g. A,C, T, G...). The skeletons are physical objects so they
would have to be 3D mapped in a computer and broken down quantitatively into
strings of numbers so the AI can ingest them. This is addressed by the field
of quantitative morphometrics and there are already large existing datasets for many animal species. The
morphological traits would be encoded as embeddings in a multidimensional
embedding space where similar traits are located near each other.
Then, we would teach the machine to use the genome to predict what kind of skeleton it would make. A deep neural network using a transformer-like architecture would learn the statistical mappings from sequence space to shape space.
Even an expert geneticist may feel doubtful about the ability of a machine learning system to fluently translate between DNA and skeletons. But keep in mind that this is what these
autoregressive systems do, they are uncanny at learning how to predict one sequence from another. Contemporary
systems have billions of parameters to tweak in order to memorize the intricate
relationships between the input sequences and their corresponding output
sequences. And there would be ways to test the model to see if it is performing accurately. The model would have to be validated with data not used in
training. For example, we could hold back the data from chickens in order to
see how close to an actual chicken skeleton the system can get when given a
chicken genome. If the system performs well, we would move to the second step in the pipeline, taking this forward model
and running it backwards.
Once the forward model is fully trained, we
would invert it, applying it in reverse to predict plausible genomic sequences
from input skeletal morphology. This is shown in the second step seen in the
figure below. We would feed the inverted model skeletal data and train it to
output genomes. This bi-directional genotype-phenotype modeling is similar to
recent breakthroughs in protein design (e.g. Chroma, RFdiffusion) where forward prediction unlocks plausible generative
capacity in the inverse direction.
Once the model is adequately trained
to turn skeletal data from living vertebrates into genetic sequences it could
be fed morphometric data from a fossil dinosaur skeleton. This would
necessitate a well-preserved and 3D mapped fossil dinosaur skeleton. After
feeding this data into the inverse model, it would output a distribution of
plausible genomic sequences consistent with bird-reptile sequence-morphology
mappings (sampling from a probabilistic output space). If a bird’s genome
explains its skeleton, then a dinosaur’s skeleton must have hidden information
that can hint at its genome. You can see why this would not generate the true dinosaur
genome. But it would generate a “best-fit plausible genome” grounded in
comparative genomics and constrained by actual fossilized dino anatomy. This general idea resembles recent AI breakthroughs in structure-function inference in
proteins, but here is applied to deep-time vertebrate paleogenomics.
It is worth mentioning that it
wouldn’t make sense to start by predicting bird genomes from bird skeletons, it
must be done the other way around. Step one in this pipeline is essential because the direction of causality plays a role here,
because genes cause bodies. Also, many different genotypes could result in
similar skeletons and thus predicting genomes from skeletons requires learning
a much more complex probability distribution. That is why starting by
predicting skeletons from genomes creates a framework of internalized
generative biological relationships which can be used to constrain guesses when
running it backwards on fossils. It is like learning how to bake bread, you
must first learn how ingredients combine to make bread before you can try to
work backwards and guess what ingredients went into a loaf by looking at its
shape and texture. Again, this method may sound outlandish at first,
but it’s actually exactly what these models do and are being full-stack optimized to get
better at: to exhaustively search for and learn all of the predictive patterns
that can be found between an input sequence its output sequence.
By training a model on species where
both the genome and skeleton are known, it becomes possible to map out a shared
space where genetic traits and anatomical features co-vary in predictable ways.
It also unearths latent traits, which are hidden patterns that connect DNA and
anatomy. While it will never recover full genomes, it will generate constrained,
probabilistic profiles, possibly offering the most scientifically grounded
glimpse yet into the genomic architecture of extinct species. Of course, adding
more information, aside from just skeletal anatomy, will help us reconstruct
genomes with even more precision. Let’s discuss this next.
4.0 What Other Information Might
Assist In This Reconstruction?
We can glean a lot of pertinent
information from fossils. Bone morphology gives us size, shape,
posture, musculature, joint angles, growth rate, vascularization, and stress
markers. Many fossilized bones can be studied under a microscope, revealing histological information. The bones also give us lots of internal geometry that relates to
internal organs. Comparing fossils of animals of different ages provides
information about the way the bones change over lifespan as well as growth
curves and the hormones and chemicals that might underlie them. This all
informs which developmental genes (e.g., FGF, BMP, Runx2) likely governed these
features. Bones alone could reveal significant genotype-to-phenotype mappings.
But paleontologists have so much more on
dinosaurs than just bones. There are also fossils of soft tissue impressions
that show skin, feathers, and scales, providing copious information that would
be necessary for approximating dino features. These impressions offer a wealth
of information about the shape and composition of soft tissues that would have sat
on top of the bones. Given enough of this information about the 3D structure of
the exterior of the body, it could also be analyzed using the three-step
pipeline above. In that case, it would be compared to the exterior of birds and
reptiles. Just look at this highly preserved (practically mummified) impression
of the ankylosaur, borealopelta. There are also some findings of dinosaur internal organs
and blood vessels. Data like this provides a wealth of anatomical information
that could be utilized. Even if you don't have fossilized internal organs for the species you are interested in, other examples could work as informative index fossils.
There are many other forms of fossils that could lend meaningful data to DNA reconstruction. Fossilized eggs are common and their size, shape, and composition provide clues. Sometimes, although rarely, these eggs can contain fossilized embryos. Trace fossils like tracks and footprints add to the detail and give us information about the feet, the gait, and biomechanics that could be compared to that of birds. Even gastroliths (stomach stones) and coprolites (poop) offer mathematical and geometrical details. Scientists have also uncovered information regarding dinosaur melanocytes and pigmentation that offer strong clues about the colors of skin, scales, and feathers. Given that artist’s representations of dinosaurs (paleoart) has been improving in scientific rigor for over a century, an AI system could also utilize pictorial and video CGI reconstructions of dinosaurs as references. Brain endocasts (bony brain cases) are often uncovered in fossil specimens and they allow scientists to see the shape and size of the species’ brain. They also help paleoneurologists compare the proportional size of their neuroanatomical structures to the brains of crocodilians and birds. You might be interested in what I have written about using AI to predict brain structure from endocasts here:
https://www.observedimpulse.com/2025/09/how-ai-could-be-used-to-reconstruct.html
Information about dinosaur behavior would also help and thankfully scientists have been accumulating this for over a century. Careful study has revealed very specific inferences about nesting, brooding, pack hunting, mating, sociality, intelligence, and much more. Fossil burrows, resting traces, feeding traces, and gnaw marks add to the resolution. There has also been lots of exacting scientific work on ontogenetic development, pathology, thermoregulatory physiology, isotopic signatures, paleobotany, and paleoenvironmental reconstruction. All this information helps us make assumptions about behavior, and comparing it to bird behavior allows us to constrain neurological and endocrine gene candidates. All taken together, we have an extraordinary amount of collateral evidence that can guide probabilistic reconstruction. Our multimodal agentic AI would not just “guess blindly” but would condition its genome drafts on a broad constellation of well-studied physical, ecological, physiological, and behavioral constraints.
This could lead to the discovery that certain dinosaurs had interesting soft tissue structures that did not fossilize. It is commonly pointed out that when artists reconstruct dinosaurs from their fossil bones, they basically "shrink wrap" the bones with skin and that this may lead to the omission of parts that don't fossilize. If one were to look at the bones in the tail of a beaver, they may miss the fact that it’s a giant paddle, grossly misrepresenting the animal. But the present approach could help in this regard. It’s possible that it could help recognize the existence of various soft tissue structures, such as dewlaps (extendable throat fan), expandable frills, casques, horns, head crests, inflatable throat sacks, wattles, snoods, combs, and back or tail crests.
Currently, scientists can and do make crude mathematical predictions about ancient DNA from living species. Most of the present research on resurrecting dinosaurs involves using phylogenetic methods to look for conserved and divergent sequences from birds to reconstruct a plausible common ancestor. Reconstructing the common ancestor of all birds, which can only be done partially, but researchers are working on now, would be helpful but it wouldn’t carry us to extinct dinos. I think it will really help to squeeze the latent genetic information out of the bones themselves. And I looked, there is precedent for this kind of work. There is already published research where scientists have been able to make correct predictions about certain physical human traits like face or bone shapes from genes alone. However, I think the present method could take all this a lot farther.
It seems that no one has proposed a system that could learn to predict skeleton shape based on DNA coming from modern birds and crocodiles, and then use that knowledge in reverse to predict dinosaur DNA from fossil skeletons. I actually asked GPT, Gemini, Claude, and Grok to search the web and see if anything like what I am laying out here has already been proposed and they each reported, after many combined minutes of search, and several "deep research" sessions, that there is nothing like this on the internet and that it may be the best starting framework for resurrecting dinosaurs.
5.0 Birds and Reptiles Provide a
Wealth of Genetic and Anatomical Data
Scientists use techniques such as comparative genomics and the molecular clock technique (among others) to map relatedness in vertebrates and this gives our AI a huge family tree to reference. Remember that birds are technically dinosaurs, so we do have dino DNA and this could go a long way toward predicting dinosaur genomes, especially certain kinds. Tyrannosaurus rex and velociraptors, like all birds, were members of theropoda, a group of bipedal, mostly carnivorous dinosaurs. In fact, birds and all theropods even belong to an even more specialized branch called Coelurosauria, which includes feathered dinosaurs and some of the most cognitively advanced species of the Mesozoic. Because birds are themselves coelurosaurian theropods, extinct lineages within Coelurosauria would be the most manageable for a bird-based proxy approach to resurrection. Of course dromaeosaurs like the raptors, as well as other paravian species (troodontids) would be closer (and easier to piece together) than a T. rex. Kind of cool for us because the dromaeosaurs like velociraptor and Utah raptor were likely the most intelligent and agile of all dinosaurs. They are among the most interesting and recognizable dinos and it is a nice coincidence that they would be among the easiest to bring back.
The last common ancestor of all birds lived around 100 million
years ago in the late Cretaceous. Scientists have employed sophisticated bioinformatics to trace back how bird genomes evolved through deep time, attempting to piece together this genome. But the common ancestor that birds shared with the T. rex would have
lived 160 million years ago in the Jurassic. For velociraptors this number is more like 150 million years ago. Unfortunately, that leaves lots of time for these lineages to have changed appreciably from the common ancestor they share with birds. But as you will see in this section, we have so much more than just birds to inform us.
Next, let's take a look at various dinosaurs and nondinosaur contemporaries (marine and flying mesozoic reptiles) and see which would be the easiest to model given their evolutionary distance from living reptiles and birds. This table summarizes the time that has passed since each group shared a common ancestor with a living animal, giving a proxy for relatedness. The easiest animals to bring back will be those at the bottom of the list, and perhaps that is where a project like this should start.
Animal | Closest Living Relative | Years Since Divergence (Mya) | Geologic Period |
Ichthyosaurs | Reptiles | 260 | Late Permian |
Plesiosaurs | Turtles | 250 | Early Triassic |
Pterosaurs | Reptiles | 240 | Middle Triassic |
Ornithischian Dinos | Birds | 240 | Late Triassic |
Sauropods | Birds | 230 | Late Triassic |
Abelisaurs | Birds | 190 | Early Jurassic |
Spinosaurs | Birds | 180 | Middle Jurassic |
Mosasaurs | Snakes | 170 | Middle Jurassic |
Allosaurs | Birds | 170 | Middle Jurassic |
Tyrannosaurs | Birds | 160 | Late Jurassic |
Dromaeosaurs | Birds | 150 | Late Jurassic |
Bird Com. Ancestor | NA | 90 | Late Cretaceous |
So as you can see, it is not just birds that can help in this regard. Birds and crocodilians are living archosaurs, a taxonomic group (clade) that dinosaurs are also nested inside. That means that crocs (alligators, caimans, gharials, and crocodiles) have much to offer as well. They offer an excellent contrast with the birds to help us interpolate about dinosaurs. So again, luckily we are not extrapolating from birds, we are interpolating between birds and crocodilians.
Another helpful landmark, the bird-crocodilian split, happened around 250 million years ago in the Triassic. This was the common archosaur ancestor and scientists have recently used advanced comparative genomics techniques to reconstruct an estimated 50% of its genome at 91% accuracy. Genomic comparison of bird vs. croc
genomes identifies both deeply conserved elements and divergent innovations.
Using genomes from birds, crocs, and other reptiles would help us reconstruct dinosaur gene order, chromosomal arrangement, as well as estimate regulatory architecture. It definitely helps that we have access to
thousands of living archosaur and even coelurosaur genomes to study
and use as references.
Data from lizards, snakes, and turtles will also contribute. Even reptiles like the "living fossil," tuatara offer further comparative opportunities given their genetic distance from other reptiles, their slow-evolving genome, and ancient reptilian traits. In fact, there are many interesting species, including monotremes (egg laying mammals), that could help constrain the baseline reptilian architecture upon which dinosaur genomic traits evolved. Take a look at the figure below that shows some of the amazing diversity of reptilian (sauropsid) forms that are present today that could be harnessed to make historical projections about dinos. That featherless parrot already looks a lot like a tiny T. rex to me.
There
are also some fascinating birds that lend comparative details. Large flightless
birds like the cassowary are about as close as modern birds get to true dinos
(their feet look just like the feet of the dinosaurs in your favorite Jurassic movie). Two
species of giant moas (Dinornis robustus and Dinornis novaezealandiae)
might provide some profound insights. These towering birds, the only birds without even a vestige of a wing, are currently
extinct although they were hunted as recently as 500 years ago. Over the last
decade ancient-DNA labs have recovered both mitochondrial and sizable nuclear
genomes from several moa species, including the two “giant moa” one of which was taller than 11 feet and over 600 pounds. Because moa
evolved the largest body masses ever achieved by birds (up to 250 kg)
independently of today’s ratites (ostriches, emus, cassowaries), their genomes
are a natural experiment in avian gigantism. They give us a statistically
powerful way to see which genetic routes birds can and cannot take when they
achieve very large sizes. Thus, moas would supply our dinosaur-inference pipeline
with needed genotype-phenotype pairs at the extreme end of body size.
5.5 Other Related Techniques
Another source of information that I haven’t mentioned yet is knowledge of the skeletal forms or fossil remains of the ancestors of the species of interest. For instance, if you wanted to make predictions about the genome of a gorgonopsid (an early mammal-like reptile), you would want to use its skeleton, but you would also be interested in information about the relatives of the gorgonopsid. Understanding its phylogeny and incorporating information about the fossil skeletons of its predecessors, successors, and cousins would provide much relevant detail. Thus, analyzing and considering synapsid, pelycosaur, and therapsid remains in the gorgonopsid’s line of descent could tell you a lot about the constraints on the gorgonopsid genome.
Yet another form of information that could help constrain a
species’ genome is fossils of that species at different points in its lifetime.
Growth and development in a tyrannosaur could be compared to growth in birds
and used to inform developmental genetics. As mentioned earlier, these kinds of comparisons could
even be extended to prehatched dinos as the embryonic or fetal remains of dinosaurs are
sometimes found inside their shells.
Another completely different technique would be to forget about genetics and resurrection entirely and just train a system to generate imagery of dinosaurs. This would involve a machine learning system that learned to predict what a bird or reptiles looks like based on its skeleton. That system could then be applied to dinosaur fossils. Thus it would receive and encode skeletons and use them to generate pictures of anatomy, body shape, and physical features. It could be trained using MRI scans of reptilian and avian bodies. Or alternatively, skeletons could be matched with photographs of living animals and a system like this could create pictures of the subject using generative technique like imagery diffusion. A system like this would help us picture how they would have really looked and could potentially be used as data to inform the genetic reconstruction. I have written more about how a system like this would work, here:
https://www.observedimpulse.com/2025/09/skull-to-face-using-ai-to-recreate-lost_9.html
Here is what it would look like if it were used to image faces from skulls.
It is worth mentioning that a process like this could be used to approximate the faces of our hominin ancestors. Neanderthal, Denisovan, Homo Floresiensis and other skulls could be entered into an AI model after the model has been trained on human and ape skull / face pairings. This could be combined with other techniques to predict the facial features of ancient humans, another sight I have long thought lost to time before the advent of modern AI.
Next, let's talk about an alternate approach to resurrection. Some scientists have been attempting to take living birds and change specific genetic features to create a throwback look. Jack Horner (the inspiration for Dr. Grant in the Jurassic Park series and technical advisor on the films) has a project to create a “chicken-o-saurus.” This project is aimed at creating a modified chicken that expresses dormant dinosaur-like traits. Dr. Horner wants to use gene-editing tools like CRISPR to counter the recent genes that made birds less like dinos. He envisions a chicken with teeth, a long tail, arms with clawed hands, and a rounded snout rather than a beak. I imagine we would want to remove feathers, the keel on the sternum, and the pygostyle (fused tail vertebrae).
There are several other de-extinction projects underway now, but they all involve animals whose entire genome has been recovered from their remains because they went extinct recently. Furthermore, all these projects are not really true clones or resurrections. They all involve taking a related animal and changing a few genes (similar to the chicken-o-saurus concept).
For example, Colossal Biosciences
claim to be bringing animals such as the woolly mammoth, the thylacine, and
dire wolves back from extinction. But in reality, they are taking the extinct
genome as reference and then editing genetic sites in the closest living
relatives to make a proxy animal with key traits. To “create” a woolly mammoth
they are giving Asian elephants “cold resistant” attributes such as fur,
increased fat, altered hemoglobin, and smaller ears. This is achieved by
multiplex gene editing. To create the “dire wolves” they edited 20 sites across
14 genes in gray-wolf DNA using sequences inferred from ancient dire wolf
remains and then cloned the edited cells Why don’t these companies just build
an animal around the recovered genome? As the next section will explain, that
is just too far beyond today’s technology.
6.0 Dinosaur Embryology
Even if the present method resulted in a full, viable, synthetic Tyrannosaurus rex genome on a computer, bringing it to life would involve a complex series of technological steps. First it would have to go from zeros and ones on a computer, to the actual DNA molecule, a long linear polymer three billion letters long. The genetic code would have to be synthesized in segments using methods like Gibson assembly or yeast-based artificial chromosome construction (YAC). Currently something like this has been done for stretches of microbial DNA but whole-genome synthesis has currently not been achieved for animals. In other words, the technology to assemble the molecule does not yet exist. But let's say that we could build a genome de novo, even then there would still be major roadblocks.
The synthetic genome would be inserted into a host cell, like a de-nucleated bird ovum. This could be done via somatic cell nuclear transfer (SCNT) similar to the method used for Dolly the sheep. It is worth mentioning that even though several mammals have been cloned, scientist are still unable to clone a bird. The machinery of the cell that is hosting our T. rex genome must recognize it and properly express its proteins, helping it build a body. There must be no mismatches or incompatibilities with the cellular (cytoplasmic) environment or with host mitochondrial DNA. This ovum would then need to be implanted within a surrogate egg (even an ostrich egg is estimated to be three to four times smaller than a T. rex egg) or artificial womb. Incubation conditions would have to align precisely with those expected by a T. rex in embryological development.
Post-hatching, the
dinosaur would require intensive care, proper diet, temperature, humidity, and
parental, brood, and social interaction. You can see how difficult this would
be. Our technology is not there and not even close right now, but of course it is possible that AI
may change this rather rapidly. But let’s keep in mind that using the present method to create information about dinosaurs genomes has value outside of de-extinction such as deepening
our understanding of biology and evolution.
7.0 What Other Animals Could Be
Modeled Using this Framework?
Dino genes must be inferred because no
truly intact, sequence-quality dinosaur DNA has ever been recovered and likely never
will. The only genetic traces extracted from dino fossils are highly degraded
molecules (possible chromosome fragments, collagen sequences, and chemical DNA
markers) found inside exceptionally preserved dinosaur cartilage or bone. Even
these findings are controversial and nowhere near the quality needed for genome
sequencing or “de-extinction.” Experiments on ancient bones show DNA’s average
bond half-life is about 521 years at 13 °C. Given this rate, statistical decay
predicts all links would be destroyed after around 6.8 million years, even in
perfect conditions. Unfortunately, non-avian dinosaurs went extinct 66 million
years ago, ten times beyond that limit. In fact, retrieving an entire genome
from the fossil remains of any species becomes very difficult after 100,000
years. After one million years, even if preserved by very cold or dry
conditions, any DNA that is recovered will be fragmentary.
The present technique is not exclusive to dinosaurs and could be applied to any extinct animal or plant (and possibly other kingdoms of life as well). These include
ancient animals such as trilobites, euryptids (sea scorpions), giant
dragonflies, and ammonites; Mesozoic marine reptiles such as plesiosaurs,
mosasaurs, and ichthyosaurs; Pleistocene megafauna such as woolly rhinoceros,
cave lions, and giant ground sloths; early mammal-like reptiles (synapsids)
such as pelycosaurs, and cynodonts, as well as recent human ancestors such as
australopithecines, homo erectus, and homo heidelbergensis. It should even work
on plants and fungi because we have many fossils of ancient plants. However, it is
unclear if a technique like this could meaningfully stretch back 500 million years ago to
Cambrian animal fossils like haikouichthys, anomalocaris, and hallucigenia in the absence
of close modern relatives.
The fact that DNA can easily survive 10,000 years means that dodos, thylacines (marsupial tigers), woolly mammoths, and saber-toothed cats would not need the technique I am introducing here to be cloned and resurrected. Here is a list of recently extinct mammals that could be recovered using their actual DNA. The table gives their approximate extinction date and closest living relative. The nearest relatives could be very important in providing information related to embryology, methylation patterns, and healthy development.
Chronological De-Extinction Candidate Table with DNA
Status
|
Species |
Approx. Extinction (years ago) |
Closest Living Relative |
DNA Status |
|
Thylacine |
90 |
Numbat / Tasmanian devil |
Yes. High-quality nuclear genome
recovered; near-complete. |
|
Passenger Pigeon |
111 |
Band-tailed pigeon |
Yes. Complete reference genome assembled (Revive
& Restore). |
|
Quagga |
~142 |
Plains zebra |
Yes. Partial nuclear genome; mtDNA complete;
recoverable via zebra reference. |
|
Great Auk |
~180 |
Razorbill / puffin |
Yes. Nuclear genome sequenced from museum skins;
coverage improving. |
|
Aurochs |
~400 |
Domestic cattle |
Yes. Draft genome assembled; |
|
Dodo |
~330 |
Nicobar pigeon |
Yes. Nuclear genome reconstructed from museum
material; ongoing refinement. |
|
Moas |
~500–600 |
Kiwi |
Yes. Several moa species have nuclear genomes
from ancient DNA. |
|
Elephant Bird |
~1,000 |
Kiwi / ostrich relatives |
Yes. High-quality nuclear genomes from eggshell
DNA. |
|
Woolly Mammoth |
~4,000 |
Asian elephant |
Yes. Multiple high-coverage genomes from
permafrost specimens. |
|
Irish Elk |
~7,700 |
Fallow / red deer |
Yes. Partial genome fragments; recovery feasible
with enrichment. |
|
Dire Wolf |
~9,500 |
Gray wolf |
Yes. High-coverage genome sequenced (clarified
distinct lineage). |
|
Giant Ground Sloth |
~10,000 |
Tree sloths |
Yes. Medium-coverage genome from subfossil
material; recoverable but fragmentary. |
|
Saber-tooth Cat |
~10,000 |
Modern big cats |
No. DNA not recoverable (asphalt
fossils destroy molecules). |
|
Woolly Rhinoceros |
~10,000 |
Sumatran rhino |
Yes. High-coverage genomes from Siberian
permafrost. |
|
Short-faced Bear |
~11,000 |
Spectacled / brown bears |
Yes. Low-coverage nuclear DNA recovered;
potentially improvable. |
|
American Mastodon |
~11,000 |
Asian / African elephants |
Yes. Low-coverage genome available; additional
data possible. |
|
Cave Lion |
~14,000 |
Modern lion / tiger |
Yes. Multiple high-coverage genomes
from frozen cubs. |
|
Megalania |
~40,000 |
Komodo dragon |
No. DNA unconfirmed; subfossil
material may yield short fragments. |
|
Wonambi |
~40,000 |
Modern pythons |
No DNA recovered; fossils too
mineralized for genome recovery. |
This next table contains a list of extinct human precursors or hominins. Only two
have had their genomes reconstructed and most of their genomes are lost to
time. This table gives their extinction date, the time at which they diverged
from humans, and DNA status. Clearly the more recent species with more recent
divergence dates would be easier to model and potentially resurrect using the techniques discussed
here.
Extinct Hominins and Human Ancestors — Chronological Table with DNA
Feasibility
|
Species |
Approx. Extinction (years ago) |
Divergence from H. sapiens (Mya) |
Epoch / Period |
DNA / Genome Status |
Feasibility & Notes |
|
Homo neanderthalensis |
~40,000 |
~0.6–0.8 |
Late Pleistocene |
Yes. Multiple high-coverage genomes |
Fully sequenced; interbred with
modern humans |
|
Homo denisova |
~40,000 |
~0.6–0.8 |
Late Pleistocene |
Yes. Multiple high-coverage genomes |
Distinct branch sister to
Neanderthals |
|
Homo floresiensis |
~50,000 |
~1.8 |
Late Pleistocene |
No DNA but possible |
Derived from early erectus;
diminutive island species. |
|
Homo luzonensis |
~67,000 |
~1.8 |
Late Pleistocene |
No DNA but possible |
Possibly descended from early Asian Homo
lineages. |
|
Homo erectus |
~117,000 |
~1.8–2.0 |
Late Pleistocene |
No DNA; Likely gone |
One of the longest-lived human
species |
|
Homo naledi |
~240,000–330,000 |
~2.0 |
Middle Pleistocene |
No DNA; Likely gone |
Surprisingly recent species with
small brain |
|
Homo heidelbergensis |
~200,000–300,000 |
~0.8–1.0 |
Middle Pleistocene |
No DNA; Likely gone |
Transitional ancestor to
Neanderthals and modern humans. |
|
Homo antecessor |
~800,000 |
~1.0 |
Early Pleistocene |
No DNA. Likely gone |
One of the oldest Europeans |
|
Homo habilis |
~1.6–2.3 million |
~2.1 |
Early Pleistocene |
No DNA. Likely gone |
First “tool-maker” of the genus Homo. |
|
Paranthropus boisei / robustus |
~1.0–1.2 million |
~2.5 |
Early Pleistocene |
No DNA. Likely gone |
Robust chewing lineage |
|
Australopithecus afarensis |
~3.0 million |
~3.0–3.5 |
Pliocene |
No DNA. Likely gone |
Bipedal; transitional ape–human
morphology |
|
Ardipithecus ramidus |
~4.4 million |
~4.5–5.5 |
Early Pliocene |
No DNA. Likely gone |
Facultative biped; earliest
well-known hominin anatomy. |
|
Sahelanthropus tchadensis |
~7.0 million |
~6.8–7.0 |
Late Miocene |
No DNA. Likely gone |
Possibly the first species after the
chimp–human split (~7 Ma). |
8.0 Weaknesses of This Approach
The method I have introduced here will
not unearth the actual genomes, just make incredibly informed guesses about it.
However, even a highly advanced AI system will not be able to reproduce sequences
where evolution introduced significant novelties, lineage-specific adaptations,
or regulatory rewiring that has no modern parallel. All of those actual genetic
mutations and adaptations that dinosaurs made, since their divergence with other reptiles, that are not found in birds, are lost to time. Furthermore,
such a synthetic genome could result in a visually compelling likeness or an
uncanny simulacrum of artistic renderings but may largely fail to reproduce
internal regulation. The process could result in animals that look like the
dinosaurs in the movies but whose physiology and even behavior is closer to
birds or crocs. This would risk creating a "chimeric reconstruction"
rather than a resurrection.
Chromosome number further confuses things. Some
birds have 40 chromosomes, other have over 140 and it is anyone’s guess how
many T rex had. Regulatory sequences (e.g., promoters, enhancers, and
silencers) control when, where, and how much genes are expressed. They are
often species-specific and evolve rapidly. A T. rex might have had unique
enhancers for muscle growth or bone density that no longer exist in its living
relatives. Epigenetic modifications (DNA methylation and histone modification)
also influence gene activity but without altering the DNA sequence. These marks
decompose with the genes so AI would have to hypothesize working epigenetic
profiles based on modern analogs, furthering uncertainty. Non-coding DNA (that
does not code for proteins, e.g., introns, regulatory elements, and supposed
junk DNA) also poses an issue. It comprises 98-99% of the genome, evolves
differently from coding regions, often contains lineage-specific adaptations,
and lacks clear genotype-phenotype correlations. The present skeletal
morphology technique could capture coding genes linked to bone structure, but
non-coding DNA’s role in overall genome stability, gene regulation, and
phenotypic variation would remain unaddressed.
It is also important to mention that there are serious ethical and ecological concerns at play here outside the scope of this entry. For instance, humans artificially selected pug dogs to have a collapsed snout because they liked the way it looked; however, this made it difficult for the dogs to breathe. There are many examples of domestication creating disease states. These examples make it clear that an engineered dinosaur could be born into an uncomfortable, painful, or diseased body. Hollywood has already pointed out many of the ethical quandaries of de-extinction including animal cruelty, human safety, and invasive ecological concerns. However, at the same time, de-extinction science overlaps greatly with conservation science and many de-extinction techniques can be used to help present day animals that are on the verge of extinction.
Currently, you cannot fit an entire
vertebrate genome into the attentional window of a transformer based neural network. This means that it
cannot take the entire genome into account when making predictions and that
some long-range dependencies may not be recognized. Bird genomes are around 1
billion base pairs (1.0 to 1.3 gigabase pairs (Gb)) and crocodile genomes
contain around 2 to 3 billion. Chat GPT can only hold about 128,000 tokens at a
time and Google Gemini can hold around around one million. That means that the
attentional window needs to be over 1,000 times bigger. The industry has seen
attention doubling every 18 to 24 months and at this rate it would be around 10
years before a transformer’s window of attention can encompass all of the DNA
in question. Of course, there are many ways to get around this, even today, (preprocess
the genome into embeddings, prioritize known relevant loci, and use
hierarchical architectures) but this is just one example of the fact that as
technology progresses this idea becomes more feasible.
The jump from predicting skeletal
morphology to generating functional genomes capable of producing viable
organisms is a significant one. The mappings are highly nonlinear, and
dependent on environmental context. Moreover, a dinosaur skeleton could produce
multiple plausible genomes, and there is no way to actually test to see if any are
accurate. Validating which could be biologically viable could also be very
difficult. One of the most sobering hard truths about this enterprise is the difficulties
inherent in embryology. To progress from a zygote, to an embryo, to a fetus, to
a healthy young animal the genetic blueprint must be incredibly internally
consistent. This is easy for nature to accomplish, but just having an AI dream
up (or worse hallucinate) an animal genome gives little reassurance that there
will not be structural inconsistencies due to the tremendous complexity of gene
interactions. Everything must work together, and work just right to avoid
developmental failure. Of course, this is a problem that a far-future
superintelligence could solve, but it won’t be solved using the method outlined
in my three-step pipeline outlined above.
But this pipeline may be more useful than it may at first seem and generalizable outside of genetics. The
three-step methodology outlined here is not just suited for biology. It could
be used as a generalizable framework for cross-domain translation: where one
set of observable features (like morphology) is used to infer a related, but
unobservable set (like genetics), via a latent, learned manifold. In fact, this
method (train forward, invert, apply to unknowns) is a flexible blueprint for
abductive reasoning via deep representation learning, and it could have
enormous potential in chemistry, physics and even psychology. You would use it in many cases when you have
A, B, and C, but not D. And A is to B as C is to D. We could call it "bidirectional
manifold mapping for latent inference." It learns to model a latent manifold
that encodes the “grammar” of a domain. Once that space is shaped well enough,
inverting across it becomes a powerful general inference engine.
9.0 Conclusion
One of the fondest memories I have is
of reading Michael Crichton’s Jurassic Park in third grade, before the first
movie came out, and marveling at the idea of extracting dino DNA from a
mosquito trapped in amber. Today, we know any DNA contained within those mosquitoes has completely decomposed, but it sure felt
elegant at the time. Even without DNA recovery, AI promises a new kind of
“virtual paleogenetics,” a way to infer and simulate the genomes and
physiologies of extinct organisms using bioinformatics, sophisticated
prediction, and comparative analysis. This is due in part to another scientific concept that enraptured me as a child (and the original Jurassic Park novel got right), that birds are dinosaurs.
Despite the major hurdles, the method
outlined here could be a good starting point for several reasons. Creating the
first genome-to-skeleton bidirectional models would produce incremental
value at each step, advancing the understanding of genotype-phenotype
interactions. Even without resurrection, these AI-generated dino genomes could
allow synthetic cell lines for studying dinosaur biochemistry in vitro,
organoid models, or simulations of growth curves, muscle structure, and
thermoregulation.
The present method would only produce
a synthetic “consensus dinosaur genome,” not a true historical sequence. It
would be educated guesswork, but firmly grounded in real data. It could even
result in living and breathing organisms that look how we expect dinosaurs to
look, although they may not function or act like the real thing. I believe that
creating a Jurassic Park with cloned approximations will be possible when
methods like those discussed here are in the hands of superintelligent AI agents.
In fact, I wouldn’t be surprised if this fantasy could brought to life within our
lifetimes.
Note: I started writing this blog after having a strong sense that artificial superintelligence could achieve dinosaur resurrection and I wanted to provide a description of that vision. But after posting the blog entry, I knew something was missing. So I sat on the couch for 10 minutes and racked my brain, telling myself over and over that there was something important I was missing. And then somehow the idea for the three step pipeline basically entered my mind fully formed. It felt like the idea just materialized, possibly from unconscious incubation, because there was next to zero reasoning involved. It’s not going to be easy to implement, but I do think it could reach and harness key latent information hidden in dinosaur fossils.







