Friday, July 18, 2025

How to Use AI to Reconstruct Dinosaur Genomes from Bird DNA and Skeletons


Jared Edward Reser Ph.D.

7/18/25

Citation for this post:

Reser, J.E. (2025). AI-Mediated Reconstruction of Dinosaur DNA from Fossil Morphology and Extant Genomes (1.0). AI Thought. https://doi.org/10.5281/zenodo.17604519


1.0 Introduction

I have dreamed of witnessing a real-life dinosaur since childhood, but the fact remains that we may never get our hands on original dinosaur DNA. Unfortunately, the genetic traces (even those trapped in amber) have completely degraded tens of millions of years ago, a major disappointment for dino aficionados and Jurassic Park fans. But as I have been watching the advance of artificial intelligence in the last couple years, I have realized that AI creates new possibilities. Modern machine learning has an astounding capacity for prediction, for finding hidden connections, and for cross referencing. Generative AI excels at constructing domain-relevant data structures when provided sufficient context and training examples. And agentic AI is being programmed to plan, organize, and automate complex work at scales outside the abilities of people. Now in 2025, it is rather easy to imagine a superintelligent AI in the future tinkering until it brings back a guesswork-based life-sized biological recreation of a Tyrannosaurus Rex. So, even if creating precise clones of long extinct creatures is impossible, in the coming years, it does seem possible for advanced AI agents to create surprisingly accurate guesswork reconstructions of dinosaur DNA.

This entry will discuss the information available to a superintelligent AI to help it piece together the clues necessary to do this. It will also propose a three-step pipeline using machine learning to generate a synthetic dinosaur genome. This pipeline leverages existing data to gather information about something we don’t have, dinosaur DNA. Essentially, we use data from modern birds and reptiles, where we have both the DNA and the bones, to train a system to recognize how genes shape skeletons. Then, by giving this trained system a dinosaur skeleton, we can ask: “What kind of genes would have built this?” In essence, it uses paired genomic and skeletal data from extant birds and reptiles to extract latent genetic signals and learn structure-function relationships, which we then apply to fossilized dinosaur skeletons to infer their genomic makeup.

Allow me to offer an analogy... involving analogies. Remember those analogical reasoning questions from the SAT? They would state two pairs of related things and then ask you how they share a common relationship. Here is an example:

“Blueprint is to building as genome is to body plan.”

All these types of questions would share the pattern A:B::C:D. Well, the technique laid out here shares that analogical structure. If A (bird skeleton) is to B (bird DNA), as C (dino skeleton) is to D (dino DNA), then what does D equal? Bird genomes are related to bird skeletons (both high dimensional vectors) by a complex analogy (a mathematical function). The math behind that analogy is beyond human capability to derive, but is tractable for machine learning. That is because it is less conceptual and more computational. But, once the machine has quantified that analogy it can apply it to dinosaur fossils to reason about dino DNA. This is because the way a genome produces a skeleton in birds is analogous to how a genome must have produced a skeleton in dinosaurs. 

A dinosaur running through a computer server

AI-generated content may be incorrect.

2.0 The AI System that Would Be Necessary

Before we get into specifics, let’s use this section to discuss the generalities. Let’s describe a general multimodal AI system that could reverse-engineer dinos by blending paleontology, comparative genomics, structural biology, and molecular AI. To assemble a synthetic approximation of a dinosaur genome, I have been envisioning an autoregressive, attention-based, neural network AI that is based on the transformer architecture. Such a system could be trained on genomes, especially genomes similar to the species that the scientist is trying to resurrect. For dinosaurs, this would be birds and crocodilians (archosaurs), and various other reptiles (sauropsids). Think of it like GPT for genes. Instead of predicting the next word, this system would predict the next nucleotide (or codon/amino acid). In other words, you enter a sequence of DNA and then it performs statistical autocomplete to make assumptions about what could come next. There is already research, and proofs of concept, in this vicinity.

Existing software tools that use transformers to make predictions about human genes include DNABERT, Nucleotide Transformer, and Enformer. Just two weeks ago Google Deep Mind released a transformer-based genomic AI called AlphaGenome. It’s not the first but it is currently the best glimpse into the applicability of AI for genomics. AlphaGenome is Deep Mind’s powerful new AI model, designed to interpret long DNA sequences (up to 1 million nucleotides (As, Cs, Ts, & Gs)) with base-pair precision. You give it a DNA sequence, and it can predict important features like where genes start and stop, how genes interact, and how active certain regions are in different cells. It helps scientists better understand how our genome works, how genetic changes might cause disease, and how we might design or edit DNA more intelligently. There are a few reasons why the generative pretrained transformer (GPT) architecture works well for genome prediction, as it does for language. Genomes contain long-range dependencies that resemble linguistic grammar and narrative structure. This allows such a system to recognize important patterns in DNA and understand their repercussions for later sequences.

Overall, a system like this, if trained on hundreds or reptile genomes to learn the relevant long-range dependencies regulatory motifs, and intron-exon patterns, might be able to help make educated guesses about the DNA of any reptile (and remember, dinosaurs are reptiles). An AI system, that also uses language and thinking, could be trained to access and reference lots of other herpetological data. Thus, the AI would have to be multimodal, taking information and building context from as many sources as possible. Such a model, trained on existing (extant) animals, would provide guardrails for producing hypothetical reptiles. However, to produce a specific dinosaur, it would need as much information as possible about the dinosaur species of interest. Luckily, we know a lot about a lot of dinosaurs. Such a system, if empowered by agentic AGI (which is developing rapidly) could take this on as a long time horizon project. Not just a chatbot, it would plan on its own, breaking the assignment down into subtasks and creating workflows. Prompting this system with a fossil from a specific dinosaur species and its phylogenetic position, it would research how to best generate probabilistic genome drafts. These genome drafts would be conditionally generated rather than inferred by genetic (or phylogenetic) parsimony. Such a system could make edits and revisions, iteratively manipulating a conjectural genome to get closer and closer to a reverse-engineered dinosaur.

Just as AlphaFold mastered complex 3D protein folding with transformers, models like the one described here might be capable of generating functional blueprints of extinct species. Given enough funding, computing power, and research, a system like this could theoretically produce a genetic sequence capable of being turned into an animal. However, keep in mind that even if we had a functional genome for a dinosaur it would still be very difficult to grow / clone it. In fact, as we will see, this proposed project is much more difficult than it sounds, and is likely beyond the abilities of a human or even teams of humans. After reading this though, you might become convinced that a future superintelligence could tackle it.

So far, this idea is very general and nebulous. We have a “multimodal superintelligent AI” trained on birds, crocodiles, turtles, snakes, and lizards, that somehow takes context from a specific dinosaur species and uses that to infer genetic sequences. But let’s get more specific and focus on what information exists within dinosaur skeletons and how it could be extracted.

3.0 Training an AI System to Predict Dino Genomes Based on Bird Skeletons

My proposed method would start by pretraining two linked neural network models. One model pretrained to work with skeletons, and the other to work with genomes. These two models would then work together, learning from examples to predict skeletal anatomy from whole-genome data coming from living reptiles. Basically, researchers would give the computer a bird genome as input and teach it to predict the outlines of the skeleton. Like other machine learning systems, it would strengthen the connections responsible for correct predictions and weaken the connections responsible for incorrect ones using backpropagation. It would know how to provide supervised feedback because it has the actual "ground truth" genomes and skeletons taken from the living animals. This learning would require linking the variation in genetic sequences to measurable phenotypic differences in the bones (size, shape, proportions, articulation geometry).

But the models connecting genes to bones would need base knowledge before training. It wouldn’t work to start off by letting a chicken genome predict a entire chicken skeleton from scratch. That works for language models because they only predict the next word, but it would never work with something as complex as a skeleton. The system needs strong priors so we would have to start with individual pretrained models for genes and skeletons and then connect them. It is not clear if the pretrained DNA foundation model should be trained on every genome we have, or just animals, or just vertebrates. But it would need to be exposed to thousands of genomes so it knows how to compress them into meaningful embeddings. 

Then a separate model would have to be pretrained on 3D skeletal shapes. The skeletal AI system would look at things like how limb bones scale across species, what structural features co-occur (femur angle with pelvic width), and general anatomical logic. This could involve a pretrained shape encoder that is trained on thousands of CT scans (or mesh landmarks) from birds and reptiles. Then these two models (genes and bones) would be aligned and trained as a joint model (e.g. contrastive learning, variational inference, diffusion bridge) to associate genome embeddings with morphology embeddings. This approach would be like teaching someone who already speaks fluent “genome-talk” and fluent “anatomy-talk” to become a translator between them instead of having to learn both languages from scratch at the same time. Merging the models could be done with relatively modest supervised training data, because the hard work of representation would have already taken place in pretraining.

This project would necessitate the genomes and naked skeletons of many reptiles and birds to build a library of examples (genotype to phenotype training corpus). This may necessitate genome and skeleton data from thousands of extant reptiles. There are over 10,000 species of reptiles and birds each, and this is great, because the more data the better. As of February 2022, genomes have been completed for over 540 species of birds, including at least one from every bird order. This is important because distantly related species will help the system can capture the diversity of forms. However, the system would also benefit from being exposed to intraspecies diversity, meaning that it should help to have numerous samples from the same species. The more, the better. It is highly possible that other vertebrate (chordate) data, like fish, amphibian, and mammalian data, could strengthen the model's ability to work with dinosaurs. Either way, many animal’s genomes would have to be sequenced because numerous training examples would be needed. 

How would we prepare the data for the computer? The genome is already a linear sequence of nucleotides ready for machine learning (e.g. A,C, T, G...). The skeletons are physical objects so they would have to be 3D mapped in a computer and broken down quantitatively into strings of numbers so the AI can ingest them. This is addressed by the field of quantitative morphometrics and there are already large existing datasets for many animal species. The morphological traits would be encoded as embeddings in a multidimensional embedding space where similar traits are located near each other.

Then, we would teach the machine to use the genome to predict what kind of skeleton it would make. A deep neural network using a transformer-like architecture would learn the statistical mappings from sequence space to shape space. 

Even an expert geneticist may feel doubtful about the ability of a machine learning system to fluently translate between DNA and skeletons. But keep in mind that this is what these autoregressive systems do, they are uncanny at learning how to predict one sequence from another. Contemporary systems have billions of parameters to tweak in order to memorize the intricate relationships between the input sequences and their corresponding output sequences. And there would be ways to test the model to see if it is performing accurately. The model would have to be validated with data not used in training. For example, we could hold back the data from chickens in order to see how close to an actual chicken skeleton the system can get when given a chicken genome. If the system performs well, we would move to the second step in the pipeline, taking this forward model and running it backwards.

Once the forward model is fully trained, we would invert it, applying it in reverse to predict plausible genomic sequences from input skeletal morphology. This is shown in the second step seen in the figure below. We would feed the inverted model skeletal data and train it to output genomes. This bi-directional genotype-phenotype modeling is similar to recent breakthroughs in protein design (e.g. Chroma, RFdiffusion) where forward prediction unlocks plausible generative capacity in the inverse direction.

 


Once the model is adequately trained to turn skeletal data from living vertebrates into genetic sequences it could be fed morphometric data from a fossil dinosaur skeleton. This would necessitate a well-preserved and 3D mapped fossil dinosaur skeleton. After feeding this data into the inverse model, it would output a distribution of plausible genomic sequences consistent with bird-reptile sequence-morphology mappings (sampling from a probabilistic output space). If a bird’s genome explains its skeleton, then a dinosaur’s skeleton must have hidden information that can hint at its genome. You can see why this would not generate the true dinosaur genome. But it would generate a “best-fit plausible genome” grounded in comparative genomics and constrained by actual fossilized dino anatomy. This general idea resembles recent AI breakthroughs in structure-function inference in proteins, but here is applied to deep-time vertebrate paleogenomics.

It is worth mentioning that it wouldn’t make sense to start by predicting bird genomes from bird skeletons, it must be done the other way around. Step one in this pipeline is essential because the direction of causality plays a role here, because genes cause bodies. Also, many different genotypes could result in similar skeletons and thus predicting genomes from skeletons requires learning a much more complex probability distribution. That is why starting by predicting skeletons from genomes creates a framework of internalized generative biological relationships which can be used to constrain guesses when running it backwards on fossils. It is like learning how to bake bread, you must first learn how ingredients combine to make bread before you can try to work backwards and guess what ingredients went into a loaf by looking at its shape and texture. Again, this method may sound outlandish at first, but it’s actually exactly what these models do and are being full-stack optimized to get better at: to exhaustively search for and learn all of the predictive patterns that can be found between an input sequence its output sequence.

By training a model on species where both the genome and skeleton are known, it becomes possible to map out a shared space where genetic traits and anatomical features co-vary in predictable ways. It also unearths latent traits, which are hidden patterns that connect DNA and anatomy. While it will never recover full genomes, it will generate constrained, probabilistic profiles, possibly offering the most scientifically grounded glimpse yet into the genomic architecture of extinct species. Of course, adding more information, aside from just skeletal anatomy, will help us reconstruct genomes with even more precision. Let’s discuss this next.

4.0 What Other Information Might Assist In This Reconstruction?

We can glean a lot of pertinent information from fossils. Bone morphology gives us size, shape, posture, musculature, joint angles, growth rate, vascularization, and stress markers. Many fossilized bones can be studied under a microscope, revealing histological information. The bones also give us lots of internal geometry that relates to internal organs. Comparing fossils of animals of different ages provides information about the way the bones change over lifespan as well as growth curves and the hormones and chemicals that might underlie them. This all informs which developmental genes (e.g., FGF, BMP, Runx2) likely governed these features. Bones alone could reveal significant genotype-to-phenotype mappings.

But paleontologists have so much more on dinosaurs than just bones. There are also fossils of soft tissue impressions that show skin, feathers, and scales, providing copious information that would be necessary for approximating dino features. These impressions offer a wealth of information about the shape and composition of soft tissues that would have sat on top of the bones. Given enough of this information about the 3D structure of the exterior of the body, it could also be analyzed using the three-step pipeline above. In that case, it would be compared to the exterior of birds and reptiles. Just look at this highly preserved (practically mummified) impression of the ankylosaur, borealopelta. There are also some findings of dinosaur internal organs and blood vessels. Data like this provides a wealth of anatomical information that could be utilized. Even if you don't have fossilized internal organs for the species you are interested in, other examples could work as informative index fossils.

Borealopelta - Wikipedia

There are many other forms of fossils that could lend meaningful data to DNA reconstruction. Fossilized eggs are common and their size, shape, and composition provide clues. Sometimes, although rarely, these eggs can contain fossilized embryos. Trace fossils like tracks and footprints add to the detail and give us information about the feet, the gait, and biomechanics that could be compared to that of birds. Even gastroliths (stomach stones) and coprolites (poop) offer mathematical and geometrical details. Scientists have also uncovered information regarding dinosaur melanocytes and pigmentation that offer strong clues about the colors of skin, scales, and feathers. Given that artist’s representations of dinosaurs (paleoart) has been improving in scientific rigor for over a century, an AI system could also utilize pictorial and video CGI reconstructions of dinosaurs as references. Brain endocasts (bony brain cases) are often uncovered in fossil specimens and they allow scientists to see the shape and size of the species’ brain. They also help paleoneurologists compare the proportional size of their neuroanatomical structures to the brains of crocodilians and birds. You might be interested in what I have written about using AI to predict brain structure from endocasts here: 

https://www.observedimpulse.com/2025/09/how-ai-could-be-used-to-reconstruct.html


Information about dinosaur behavior would also help and thankfully scientists have been accumulating this for over a century. Careful study has revealed very specific inferences about nesting, brooding, pack hunting, mating, sociality, intelligence, and much more. Fossil burrows, resting traces, feeding traces, and gnaw marks add to the resolution. There has also been lots of exacting scientific work on ontogenetic development, pathology, thermoregulatory physiology, isotopic signatures, paleobotany, and paleoenvironmental reconstruction. All this information helps us make assumptions about behavior, and comparing it to bird behavior allows us to constrain neurological and endocrine gene candidates. All taken togetherwe have an extraordinary amount of collateral evidence that can guide probabilistic reconstruction. Our multimodal agentic AI would not just “guess blindly” but would condition its genome drafts on a broad constellation of well-studied physical, ecological, physiological, and behavioral constraints.

This could lead to the discovery that certain dinosaurs had interesting soft tissue structures that did not fossilize. It is commonly pointed out that when artists reconstruct dinosaurs from their fossil bones, they basically "shrink wrap" the bones with skin and that this may lead to the omission of parts that don't fossilize. If one were to look at the bones in the tail of a beaver, they may miss the fact that it’s a giant paddle, grossly misrepresenting the animal. But the present approach could help in this regard. It’s possible that it could help recognize the existence of various soft tissue structures, such as dewlaps (extendable throat fan), expandable frills, casques, horns, head crests, inflatable throat sacks, wattles, snoods, combs, and back or tail crests. 



Currently, scientists can and do make crude mathematical predictions about ancient DNA from living species. Most of the present research on resurrecting dinosaurs involves using phylogenetic methods to look for conserved and divergent sequences from birds to reconstruct a plausible common ancestor. Reconstructing the common ancestor of all birds, which can only be done partially, but researchers are working on now, would be helpful but it wouldn’t carry us to extinct dinos. I think it will really help to squeeze the latent genetic information out of the bones themselves. And I looked, there is precedent for this kind of work. There is already published research where scientists have been able to make correct predictions about certain physical human traits like face or bone shapes from genes alone. However, I think the present method could take all this a lot farther. 

It seems that no one has proposed a system that could learn to predict skeleton shape based on DNA coming from modern birds and crocodiles, and then use that knowledge in reverse to predict dinosaur DNA from fossil skeletons. I actually asked GPT, Gemini, Claude, and Grok to search the web and see if anything like what I am laying out here has already been proposed and they each reported, after many combined minutes of search, and several "deep research" sessions, that there is nothing like this on the internet and that it may be the best starting framework for resurrecting dinosaurs.

5.0 Birds and Reptiles Provide a Wealth of Genetic and Anatomical Data

Scientists use techniques such as comparative genomics and the molecular clock technique (among others) to map relatedness in vertebrates and this gives our AI a huge family tree to reference. Remember that birds are technically dinosaurs, so we do have dino DNA and this could go a long way toward predicting dinosaur genomes, especially certain kinds. Tyrannosaurus rex and velociraptors, like all birds, were members of theropoda, a group of bipedal, mostly carnivorous dinosaurs. In fact, birds and all theropods even belong to an even more specialized branch called Coelurosauria, which includes feathered dinosaurs and some of the most cognitively advanced species of the Mesozoic. Because birds are themselves coelurosaurian theropods, extinct lineages within Coelurosauria would be the most manageable for a bird-based proxy approach to resurrection. Of course dromaeosaurs like the raptors, as well as other paravian species (troodontids) would be closer (and easier to piece together) than a T. rex. Kind of cool for us because the dromaeosaurs like velociraptor and Utah raptor were likely the most intelligent and agile of all dinosaurs. They are among the most interesting and recognizable dinos and it is a nice coincidence that they would be among the easiest to bring back.

The last common ancestor of all birds lived around 100 million years ago in the late Cretaceous. Scientists have employed sophisticated bioinformatics to trace back how bird genomes evolved through deep time, attempting to piece together this genome. But the common ancestor that birds shared with the T. rex would have lived 160 million years ago in the Jurassic. For velociraptors this number is more like 150 million years ago. Unfortunately, that leaves lots of time for these lineages to have changed appreciably from the common ancestor they share with birds. But as you will see in this section, we have so much more than just birds to inform us.

Next, let's take a look at various dinosaurs and nondinosaur contemporaries (marine and flying mesozoic reptiles) and see which would be the easiest to model given their evolutionary distance from living reptiles and birds. This table summarizes the time that has passed since each group shared a common ancestor with a living animal, giving a proxy for relatedness. The easiest animals to bring back will be those at the bottom of the list, and perhaps that is where a project like this should start.

 

Animal

Closest Living Relative

Years Since Divergence (Mya)

Geologic Period

Ichthyosaurs

Reptiles

260

Late Permian

Plesiosaurs

Turtles

250

Early Triassic

Pterosaurs

Reptiles

240

Middle Triassic

Ornithischian Dinos

Birds

240

Late Triassic

Sauropods

Birds

230

Late Triassic

Abelisaurs

Birds

190

Early Jurassic

Spinosaurs

Birds

180

Middle Jurassic

Mosasaurs

Snakes

170

Middle Jurassic

Allosaurs

Birds

170

Middle Jurassic

Tyrannosaurs

Birds

160

Late Jurassic

Dromaeosaurs

Birds

150

Late Jurassic

Bird Com. Ancestor

NA

90

Late Cretaceous

 

So as you can see, it is not just birds that can help in this regard. Birds and crocodilians are living archosaurs, a taxonomic group (clade) that dinosaurs are also nested inside. That means that crocs (alligators, caimans, gharials, and crocodiles) have much to offer as well. They offer an excellent contrast with the birds to help us interpolate about dinosaurs. So again, luckily we are not extrapolating from birds, we are interpolating between birds and crocodilians.

A diagram of different dinosaurs

AI-generated content may be incorrect.

Another helpful landmark, the bird-crocodilian split, happened around 250 million years ago in the Triassic. This was the common archosaur ancestor and scientists have recently used advanced comparative genomics techniques to reconstruct an estimated 50% of its genome at 91% accuracy. Genomic comparison of bird vs. croc genomes identifies both deeply conserved elements and divergent innovations. Using genomes from birds, crocs, and other reptiles would help us reconstruct dinosaur gene order, chromosomal arrangement, as well as estimate regulatory architecture. It definitely helps that we have access to thousands of living archosaur and even coelurosaur genomes to study and use as references.

 

Not a lizard nor a dinosaur, tuatara is the sole survivor of a  once-widespread reptile group

 

Data from lizards, snakes, and turtles will also contribute. Even reptiles like the "living fossil," tuatara offer further comparative opportunities given their genetic distance from other reptiles, their slow-evolving genome, and ancient reptilian traits. In fact, there are many interesting species, including monotremes (egg laying mammals), that could help constrain the baseline reptilian architecture upon which dinosaur genomic traits evolved. Take a look at the figure below that shows some of the amazing diversity of reptilian (sauropsid) forms that are present today that could be harnessed to make historical projections about dinos. That featherless parrot already looks a lot like a tiny T. rex to me.

There are also some fascinating birds that lend comparative details. Large flightless birds like the cassowary are about as close as modern birds get to true dinos (their feet look just like the feet of the dinosaurs in your favorite Jurassic movie). Two species of giant moas (Dinornis robustus and Dinornis novaezealandiae) might provide some profound insights. These towering birds, the only birds without even a vestige of a wing, are currently extinct although they were hunted as recently as 500 years ago. Over the last decade ancient-DNA labs have recovered both mitochondrial and sizable nuclear genomes from several moa species, including the two “giant moa” one of which was taller than 11 feet and over 600 pounds. Because moa evolved the largest body masses ever achieved by birds (up to 250 kg) independently of today’s ratites (ostriches, emus, cassowaries), their genomes are a natural experiment in avian gigantism. They give us a statistically powerful way to see which genetic routes birds can and cannot take when they achieve very large sizes. Thus, moas would supply our dinosaur-inference pipeline with needed genotype-phenotype pairs at the extreme end of body size.

A graph with a number of species

AI-generated content may be incorrect.

5.5 Other Related Techniques

Another source of information that I haven’t mentioned yet is knowledge of the skeletal forms or fossil remains of the ancestors of the species of interest. For instance, if you wanted to make predictions about the genome of a gorgonopsid (an early mammal-like reptile), you would want to use its skeleton, but you would also be interested in information about the relatives of the gorgonopsid. Understanding its phylogeny and incorporating information about the fossil skeletons of its predecessors, successors, and cousins would provide much relevant detail. Thus, analyzing and considering synapsid, pelycosaur, and therapsid remains in the gorgonopsid’s line of descent could tell you a lot about the constraints on the gorgonopsid genome. 


Yet another form of information that could help constrain a species’ genome is fossils of that species at different points in its lifetime. Growth and development in a tyrannosaur could be compared to growth in birds and used to inform developmental genetics. As mentioned earlier, these kinds of comparisons could even be extended to prehatched dinos as the embryonic or fetal remains of dinosaurs are sometimes found inside their shells.

 

Another completely different technique would be to forget about genetics and resurrection entirely and just train a system to generate imagery of dinosaurs. This would involve a machine learning system that learned to predict what a bird or reptiles looks like based on its skeleton. That system could then be applied to dinosaur fossils. Thus it would receive and encode skeletons and use them to generate pictures of anatomy, body shape, and physical features. It could be trained using MRI scans of reptilian and avian bodies. Or alternatively, skeletons could be matched with photographs of living animals and a system like this could create pictures of the subject using generative technique like imagery diffusion. A system like this would help us picture how they would have really looked and could potentially be used as data to inform the genetic reconstruction. I have written more about how a system like this would work, here:


https://www.observedimpulse.com/2025/09/skull-to-face-using-ai-to-recreate-lost_9.html


Here is what it would look like if it were used to image faces from skulls.



It is worth mentioning that a process like this could be used to approximate the faces of our hominin ancestors. Neanderthal, Denisovan, Homo Floresiensis and other skulls could be entered into an AI model after the model has been trained on human and ape skull / face pairings. This could be combined with other techniques to predict the facial features of ancient humans, another sight I have long thought lost to time before the advent of modern AI.

A diagram of a person's face

AI-generated content may be incorrect.

 

Next, let's talk about an alternate approach to resurrection. Some scientists have been attempting to take living birds and change specific genetic features to create a throwback look. Jack Horner (the inspiration for Dr. Grant in the Jurassic Park series and technical advisor on the films) has a project to create a “chicken-o-saurus.” This project is aimed at creating a modified chicken that expresses dormant dinosaur-like traits. Dr. Horner wants to use gene-editing tools like CRISPR to counter the recent genes that made birds less like dinos. He envisions a chicken with teeth, a long tail, arms with clawed hands, and a rounded snout rather than a beak. I imagine we would want to remove feathers, the keel on the sternum, and the pygostyle (fused tail vertebrae). 


Searching for dormant dino genes in birds, in this way, could result in the creation of very dino-like birds. Scientists inspired by Horner’s ideas went on to make a beak-less chicken with a snout that looks very dinosaur-like. To accomplish this, they found a cluster of genes related to facial development that exists in birds, but no other animals. They used an inhibitor to suppress these genes in embryonic chickens and, as you can see, the resulting bird faces appear much more like their distant dinosaur ancestors.

Chickenosaurus: How Genetically Engineered Theme Park Monsters Could Soon  Be A Thing | BEYONDbones

There are several other de-extinction projects underway now, but they all involve animals whose entire genome has been recovered from their remains because they went extinct recently. Furthermore, all these projects are not really true clones or resurrections. They all involve taking a related animal and changing a few genes (similar to the chicken-o-saurus concept). 


For example, Colossal Biosciences claim to be bringing animals such as the woolly mammoth, the thylacine, and dire wolves back from extinction. But in reality, they are taking the extinct genome as reference and then editing genetic sites in the closest living relatives to make a proxy animal with key traits. To “create” a woolly mammoth they are giving Asian elephants “cold resistant” attributes such as fur, increased fat, altered hemoglobin, and smaller ears. This is achieved by multiplex gene editing. To create the “dire wolves” they edited 20 sites across 14 genes in gray-wolf DNA using sequences inferred from ancient dire wolf remains and then cloned the edited cells Why don’t these companies just build an animal around the recovered genome? As the next section will explain, that is just too far beyond today’s technology.


6.0 Dinosaur Embryology

Even if the present method resulted in a full, viable, synthetic Tyrannosaurus rex genome on a computer, bringing it to life would involve a complex series of technological steps. First it would have to go from zeros and ones on a computer, to the actual DNA molecule, a long linear polymer three billion letters long. The genetic code would have to be synthesized in segments using methods like Gibson assembly or yeast-based artificial chromosome construction (YAC). Currently something like this has been done for stretches of microbial DNA but whole-genome synthesis has currently not been achieved for animals. In other words, the technology to assemble the molecule does not yet exist. But let's say that we could build a genome de novo, even then there would still be major roadblocks. 

The synthetic genome would be inserted into a host cell, like a de-nucleated bird ovum. This could be done via somatic cell nuclear transfer (SCNT) similar to the method used for Dolly the sheep. It is worth mentioning that even though several mammals have been cloned, scientist are still unable to clone a bird. The machinery of the cell that is hosting our T. rex genome must recognize it and properly express its proteins, helping it build a body. There must be no mismatches or incompatibilities with the cellular (cytoplasmic) environment or with host mitochondrial DNA. This ovum would then need to be implanted within a surrogate egg (even an ostrich egg is estimated to be three to four times smaller than a T. rex egg) or artificial womb. Incubation conditions would have to align precisely with those expected by a T. rex in embryological development. 

Post-hatching, the dinosaur would require intensive care, proper diet, temperature, humidity, and parental, brood, and social interaction. You can see how difficult this would be. Our technology is not there and not even close right now, but of course it is possible that AI may change this rather rapidly. But let’s keep in mind that using the present method to create information about dinosaurs genomes has value outside of de-extinction such as deepening our understanding of biology and evolution.


7.0 What Other Animals Could Be Modeled Using this Framework?

Dino genes must be inferred because no truly intact, sequence-quality dinosaur DNA has ever been recovered and likely never will. The only genetic traces extracted from dino fossils are highly degraded molecules (possible chromosome fragments, collagen sequences, and chemical DNA markers) found inside exceptionally preserved dinosaur cartilage or bone. Even these findings are controversial and nowhere near the quality needed for genome sequencing or “de-extinction.” Experiments on ancient bones show DNA’s average bond half-life is about 521 years at 13 °C. Given this rate, statistical decay predicts all links would be destroyed after around 6.8 million years, even in perfect conditions. Unfortunately, non-avian dinosaurs went extinct 66 million years ago, ten times beyond that limit. In fact, retrieving an entire genome from the fossil remains of any species becomes very difficult after 100,000 years. After one million years, even if preserved by very cold or dry conditions, any DNA that is recovered will be fragmentary.



The present technique is not exclusive to dinosaurs and could be applied to any extinct animal or plant (and possibly other kingdoms of life as well). These include ancient animals such as trilobites, euryptids (sea scorpions), giant dragonflies, and ammonites; Mesozoic marine reptiles such as plesiosaurs, mosasaurs, and ichthyosaurs; Pleistocene megafauna such as woolly rhinoceros, cave lions, and giant ground sloths; early mammal-like reptiles (synapsids) such as pelycosaurs, and cynodonts, as well as recent human ancestors such as australopithecines, homo erectus, and homo heidelbergensis. It should even work on plants and fungi because we have many fossils of ancient plants. However, it is unclear if a technique like this could meaningfully stretch back 500 million years ago to Cambrian animal fossils like haikouichthys, anomalocaris, and hallucigenia in the absence of close modern relatives.

A diagram of dna sequence

AI-generated content may be incorrect.

The fact that DNA can easily survive 10,000 years means that dodos, thylacines (marsupial tigers), woolly mammoths, and saber-toothed cats would not need the technique I am introducing here to be cloned and resurrected. Here is a list of recently extinct mammals that could be recovered using their actual DNA. The table gives their approximate extinction date and closest living relative. The nearest relatives could be very important in providing information related to embryology, methylation patterns, and healthy development.

Chronological De-Extinction Candidate Table with DNA Status

Species

Approx. Extinction (years ago)

Closest Living Relative

DNA Status

Thylacine

90

Numbat / Tasmanian devil

Yes. High-quality nuclear genome recovered; near-complete.

Passenger Pigeon

111

Band-tailed pigeon

Yes.  Complete reference genome assembled (Revive & Restore).

Quagga

~142

Plains zebra

Yes.  Partial nuclear genome; mtDNA complete; recoverable via zebra reference.

Great Auk

~180

Razorbill / puffin

Yes.  Nuclear genome sequenced from museum skins; coverage improving.

Aurochs

~400

Domestic cattle

Yes.  Draft genome assembled;

Dodo

~330

Nicobar pigeon

Yes.  Nuclear genome reconstructed from museum material; ongoing refinement.

Moas

~500–600

Kiwi

Yes.  Several moa species have nuclear genomes from ancient DNA.

Elephant Bird

~1,000

Kiwi / ostrich relatives

Yes.  High-quality nuclear genomes from eggshell DNA.

Woolly Mammoth

~4,000

Asian elephant

Yes.  Multiple high-coverage genomes from permafrost specimens.

Irish Elk

~7,700

Fallow / red deer

Yes.  Partial genome fragments; recovery feasible with enrichment.

Dire Wolf

~9,500

Gray wolf

Yes.  High-coverage genome sequenced (clarified distinct lineage).

Giant Ground Sloth

~10,000

Tree sloths

Yes.  Medium-coverage genome from subfossil material; recoverable but fragmentary.

Saber-tooth Cat

~10,000

Modern big cats 

No. DNA not recoverable (asphalt fossils destroy molecules).

Woolly Rhinoceros

~10,000

Sumatran rhino

Yes.  High-coverage genomes from Siberian permafrost.

Short-faced Bear

~11,000

Spectacled / brown bears

Yes.  Low-coverage nuclear DNA recovered; potentially improvable.

American Mastodon

~11,000

Asian / African elephants

Yes.  Low-coverage genome available; additional data possible.

Cave Lion

~14,000

Modern lion / tiger

Yes. Multiple high-coverage genomes from frozen cubs.

Megalania

~40,000

Komodo dragon

No. DNA unconfirmed; subfossil material may yield short fragments.

Wonambi

~40,000

Modern pythons 

No DNA recovered; fossils too mineralized for genome recovery.


This next table contains a list of extinct human precursors or hominins. Only two have had their genomes reconstructed and most of their genomes are lost to time. This table gives their extinction date, the time at which they diverged from humans, and DNA status. Clearly the more recent species with more recent divergence dates would be easier to model and potentially resurrect using the techniques discussed here.

 

Extinct Hominins and Human Ancestors — Chronological Table with DNA Feasibility

Species

Approx. Extinction (years ago)

Divergence from H. sapiens (Mya)

Epoch / Period

DNA / Genome Status

Feasibility & Notes

Homo neanderthalensis

~40,000

~0.6–0.8

Late Pleistocene

Yes. Multiple high-coverage genomes

Fully sequenced; interbred with modern humans

Homo denisova

~40,000

~0.6–0.8

Late Pleistocene

Yes. Multiple high-coverage genomes

Distinct branch sister to Neanderthals

Homo floresiensis

~50,000

~1.8

Late Pleistocene

 No DNA but possible

Derived from early erectus; diminutive island species.

Homo luzonensis

~67,000

~1.8

Late Pleistocene

No DNA but possible

Possibly descended from early Asian Homo lineages.

Homo erectus

~117,000

~1.8–2.0

Late Pleistocene

No DNA; Likely gone

One of the longest-lived human species

Homo naledi

~240,000–330,000

~2.0

Middle Pleistocene

No DNA; Likely gone

Surprisingly recent species with small brain

Homo heidelbergensis

~200,000–300,000

~0.8–1.0

Middle Pleistocene

No DNA; Likely gone

Transitional ancestor to Neanderthals and modern humans.

Homo antecessor

~800,000

~1.0

Early Pleistocene

No DNA. Likely gone

One of the oldest Europeans

Homo habilis

~1.6–2.3 million

~2.1

Early Pleistocene

No DNA. Likely gone

First “tool-maker” of the genus Homo.

Paranthropus boisei / robustus

~1.0–1.2 million

~2.5

Early Pleistocene

No DNA. Likely gone

Robust chewing lineage

Australopithecus afarensis

~3.0 million

~3.0–3.5

Pliocene

No DNA. Likely gone

Bipedal; transitional ape–human morphology

Ardipithecus ramidus

~4.4 million

~4.5–5.5

Early Pliocene

No DNA. Likely gone

Facultative biped; earliest well-known hominin anatomy.

Sahelanthropus tchadensis

~7.0 million

~6.8–7.0

Late Miocene

No DNA. Likely gone

Possibly the first species after the chimp–human split (~7 Ma).


8.0 Weaknesses of This Approach

The method I have introduced here will not unearth the actual genomes, just make incredibly informed guesses about it. However, even a highly advanced AI system will not be able to reproduce sequences where evolution introduced significant novelties, lineage-specific adaptations, or regulatory rewiring that has no modern parallel. All of those actual genetic mutations and adaptations that dinosaurs made, since their divergence with other reptiles, that are not found in birds, are lost to time. Furthermore, such a synthetic genome could result in a visually compelling likeness or an uncanny simulacrum of artistic renderings but may largely fail to reproduce internal regulation. The process could result in animals that look like the dinosaurs in the movies but whose physiology and even behavior is closer to birds or crocs. This would risk creating a "chimeric reconstruction" rather than a resurrection.

Chromosome number further confuses things. Some birds have 40 chromosomes, other have over 140 and it is anyone’s guess how many T rex had. Regulatory sequences (e.g., promoters, enhancers, and silencers) control when, where, and how much genes are expressed. They are often species-specific and evolve rapidly. A T. rex might have had unique enhancers for muscle growth or bone density that no longer exist in its living relatives. Epigenetic modifications (DNA methylation and histone modification) also influence gene activity but without altering the DNA sequence. These marks decompose with the genes so AI would have to hypothesize working epigenetic profiles based on modern analogs, furthering uncertainty. Non-coding DNA (that does not code for proteins, e.g., introns, regulatory elements, and supposed junk DNA) also poses an issue. It comprises 98-99% of the genome, evolves differently from coding regions, often contains lineage-specific adaptations, and lacks clear genotype-phenotype correlations. The present skeletal morphology technique could capture coding genes linked to bone structure, but non-coding DNA’s role in overall genome stability, gene regulation, and phenotypic variation would remain unaddressed.

It is also important to mention that there are serious ethical and ecological concerns at play here outside the scope of this entry. For instance, humans artificially selected pug dogs to have a collapsed snout because they liked the way it looked; however, this made it difficult for the dogs to breathe. There are many examples of domestication creating disease states. These examples make it clear that an engineered dinosaur could be born into an uncomfortable, painful, or diseased body. Hollywood has already pointed out many of the ethical quandaries of de-extinction including animal cruelty, human safety, and invasive ecological concerns.  However, at the same time, de-extinction science overlaps greatly with conservation science and many de-extinction techniques can be used to help present day animals that are on the verge of extinction.

Currently, you cannot fit an entire vertebrate genome into the attentional window of a transformer based neural network. This means that it cannot take the entire genome into account when making predictions and that some long-range dependencies may not be recognized. Bird genomes are around 1 billion base pairs (1.0 to 1.3 gigabase pairs (Gb)) and crocodile genomes contain around 2 to 3 billion. Chat GPT can only hold about 128,000 tokens at a time and Google Gemini can hold around around one million. That means that the attentional window needs to be over 1,000 times bigger. The industry has seen attention doubling every 18 to 24 months and at this rate it would be around 10 years before a transformer’s window of attention can encompass all of the DNA in question. Of course, there are many ways to get around this, even today, (preprocess the genome into embeddings, prioritize known relevant loci, and use hierarchical architectures) but this is just one example of the fact that as technology progresses this idea becomes more feasible.

The jump from predicting skeletal morphology to generating functional genomes capable of producing viable organisms is a significant one. The mappings are highly nonlinear, and dependent on environmental context. Moreover, a dinosaur skeleton could produce multiple plausible genomes, and there is no way to actually test to see if any are accurate. Validating which could be biologically viable could also be very difficult. One of the most sobering hard truths about this enterprise is the difficulties inherent in embryology. To progress from a zygote, to an embryo, to a fetus, to a healthy young animal the genetic blueprint must be incredibly internally consistent. This is easy for nature to accomplish, but just having an AI dream up (or worse hallucinate) an animal genome gives little reassurance that there will not be structural inconsistencies due to the tremendous complexity of gene interactions. Everything must work together, and work just right to avoid developmental failure. Of course, this is a problem that a far-future superintelligence could solve, but it won’t be solved using the method outlined in my three-step pipeline outlined above.

But this pipeline may be more useful than it may at first seem and generalizable outside of genetics. The three-step methodology outlined here is not just suited for biology. It could be used as a generalizable framework for cross-domain translation: where one set of observable features (like morphology) is used to infer a related, but unobservable set (like genetics), via a latent, learned manifold. In fact, this method (train forward, invert, apply to unknowns) is a flexible blueprint for abductive reasoning via deep representation learning, and it could have enormous potential in chemistry, physics and even psychology. You would use it in many cases when you have A, B, and C, but not D. And A is to B as C is to D. We could call it "bidirectional manifold mapping for latent inference." It learns to model a latent manifold that encodes the “grammar” of a domain. Once that space is shaped well enough, inverting across it becomes a powerful general inference engine.

9.0 Conclusion

One of the fondest memories I have is of reading Michael Crichton’s Jurassic Park in third grade, before the first movie came out, and marveling at the idea of extracting dino DNA from a mosquito trapped in amber. Today, we know any DNA contained within those mosquitoes has completely decomposed, but it sure felt elegant at the time. Even without DNA recovery, AI promises a new kind of “virtual paleogenetics,” a way to infer and simulate the genomes and physiologies of extinct organisms using bioinformatics, sophisticated prediction, and comparative analysis. This is due in part to another scientific concept that enraptured me as a child (and the original Jurassic Park novel got right), that birds are dinosaurs.

Despite the major hurdles, the method outlined here could be a good starting point for several reasons. Creating the first genome-to-skeleton bidirectional models would produce incremental value at each step, advancing the understanding of genotype-phenotype interactions. Even without resurrection, these AI-generated dino genomes could allow synthetic cell lines for studying dinosaur biochemistry in vitro, organoid models, or simulations of growth curves, muscle structure, and thermoregulation.

The present method would only produce a synthetic “consensus dinosaur genome,” not a true historical sequence. It would be educated guesswork, but firmly grounded in real data. It could even result in living and breathing organisms that look how we expect dinosaurs to look, although they may not function or act like the real thing. I believe that creating a Jurassic Park with cloned approximations will be possible when methods like those discussed here are in the hands of superintelligent AI agents. In fact, I wouldn’t be surprised if this fantasy could brought to life within our lifetimes.

 


Note:  I started writing this blog after having a strong sense that artificial superintelligence could achieve dinosaur resurrection and I wanted to provide a description of that vision. But after posting the blog entry, I knew something was missing. So I sat on the couch for 10 minutes and racked my brain, telling myself over and over that there was something important I was missing. And then somehow the idea for the three step pipeline basically entered my mind fully formed. It felt like the idea just materialized, possibly from unconscious incubation, because there was next to zero reasoning involved. It’s not going to be easy to implement, but I do think it could reach and harness key latent information hidden in dinosaur fossils.