Friday, July 18, 2025

How to Use AI to Reconstruct Dinosaur Genomes from Bird DNA and Skeletons


Jared Edward Reser Ph.D.

7/18/25

1.0 Introduction

I have dreamed of witnessing a real-life dinosaur since childhood, but the fact remains that we may never get our hands on original dinosaur DNA. Unfortunately, the genetic traces (even those trapped in amber) have completely degraded tens of millions of years ago, a major disappointment for dino aficionados and Jurassic Park fans. But as I have been watching the advance of artificial intelligence in the last couple years, I have realized that AI creates new possibilities. Modern machine learning has an astounding capacity for prediction, for finding hidden connections, and for cross referencing. Generative AI excels at constructing domain-relevant data structures when provided sufficient context and training examples. And agentic AI is being programmed to plan, organize, and automate complex work at scales outside the abilities of people. Now in 2025, it is rather easy to imagine a superintelligent AI in the future tinkering until it brings back a guesswork-based life-sized biological recreation of a Tyrannosaurus Rex. So, even if creating precise clones of long extinct creatures is impossible, in the coming years, it does seem possible for advanced AI agents to create surprisingly accurate guesswork reconstructions of dinosaur DNA.

This entry will discuss the information available to a superintelligent AI to help it piece together the clues necessary to do this. It will also propose a three-step pipeline using machine learning to generate a synthetic dinosaur genome. This pipeline leverages existing data to gather information about something we don’t have, dinosaur DNA. Essentially, we use data from modern birds and reptiles, where we have both the DNA and the bones, to train a system to recognize how genes shape skeletons. Then, by giving this trained system a dinosaur skeleton, we can ask: “What kind of genes would have built this?” In essence, it uses paired genomic and skeletal data from extant birds and reptiles to extract latent genetic signals and learn structure-function relationships, which we then apply to fossilized dinosaur skeletons to infer their genomic makeup.

Remember those analogical reasoning questions from the SAT? They would state two pairs of related things and then ask you how they share a common relationship. Here is an example:

“Blueprint is to building as genome is to body plan.”

All these types of questions would share the pattern A:B::C:D. Well, the technique laid out here shares that analogical structure. If A (bird skeleton) is to B (bird DNA), as C (dino skeleton) is to D (dino DNA), then what does D equal? Bird genomes are related to bird skeletons (both high dimensional vectors) by a complex analogy (a mathematical function). The math behind that analogy is beyond human capability to derive, but is tractable for machine learning. That is because it is less conceptual and more computational. But, once the machine has quantified that analogy it can apply it to dinosaur fossils to reason about dino DNA. This is because the way a genome produces a skeleton in birds is analogous to how a genome must have produced a skeleton in dinosaurs. The full machine learning pipeline is depicted by the figure in Section 3 below.

A dinosaur running through a computer server

AI-generated content may be incorrect.

2.0 The AI System that Would Be Necessary

Before we get into specifics, let’s use this section to discuss the generalities. Let’s describe a general multimodal AI system that could reverse-engineer dinos by blending paleontology, comparative genomics, structural biology, and molecular AI. To make a synthetic approximation of a dinosaur genome, I have been envisioning an autoregressive, attention-based, neural network AI that is based on the transformer architecture. Such a system could be trained on genomes, especially genomes similar to the species that the scientist is trying to resurrect. For dinosaurs, this would be birds, crocodilians, and various other reptiles (sauropsids). Think of it like GPT for genes. Instead of predicting the next word, this system would predict the next nucleotide. In other words, you enter a sequence of DNA and then it performs statistical autocomplete to make assumptions about what could come next. There is already research, and proofs of concept, in this vicinity.

Existing software tools that use transformers to make predictions about human genes include DNABERT, Nucleotide Transformer, and Enformer. Just two weeks ago Google Deep Mind released a transformer-based genomic AI called AlphaGenome. It’s not the first but it is currently the best glimpse into the applicability of AI for genomics. AlphaGenome is Deep Mind’s powerful new AI model, designed to interpret long DNA sequences (up to 1 million nucleotides (As, Cs, Ts, & Gs)) with base-pair precision. You give it a DNA sequence, and it can predict important features like where genes start and stop, how genes interact, and how active certain regions are in different cells. It helps scientists better understand how our genome works, how genetic changes might cause disease, and how we might design or edit DNA more intelligently. There are a few reasons why the generative pretrained transformer (GPT) architecture works well for genome prediction, as it does for language. Genomes contain long-range dependencies that resemble linguistic grammar and narrative structure. This allows such a system to recognize important patterns in DNA, or any long sequence, that would be invisible to a human.

Overall, a system like this might be able to help make educated guesses about bird DNA if it is trained on hundreds of bird and croc genomes to learn the relevant long-range gene dependencies, regulatory motifs, and intron-exon patterns. The system could also be trained to access and reference lots of other data we have on birds and crocs.  Thus, the AI system would have to be multimodal, taking information and building context from as many sources as possible. Such a model, trained on existing (extant) animals, would provide guardrails for producing hypothetical birds (class aves). However, to produce a specific dinosaur, it would need as much information as possible about the dinosaur species of interest. Such a system, if empowered by an AGI agent (which are developing rapidly), could be prompted with dinosaur fossil features and phylogenetic position to generate probabilistic genome drafts. The genomes would be conditionally generated rather than inferred by genetic (or phylogenetic) parsimony. Such a system could make edits and revisions, iteratively manipulating a conjectural genome to get closer and closer to a reverse-engineered dinosaur.

Just as AlphaFold revolutionized 3D protein folding with transformers, models like the one described here might be capable of generating functional blueprints of extinct species. Given enough funding, computing power, and research, a system like this could theoretically produce a genetic sequence capable of being turned into an animal. However, keep in mind that even if we had a functional genome for a dinosaur it would still be very difficult to grow / clone it. In fact, as we will see, this proposed project is much more difficult than it sounds, and is likely beyond the abilities of a human or even teams of humans. Although after reading this, you might become convinced that a future superintelligence could tackle it.

So far, this idea is very general and nebulous. We have a “multimodal superintelligent AI” trained on birds and reptiles, that somehow takes context from a specific dinosaur species and uses that to infer genetic sequences. But let’s get more specific and focus on what information exists within dino skeletons and how it could be extracted.

3.0 Training an AI System to Predict Dino Genomes Based on Bird Skeletons

My proposed method would start by pretraining an AI (machine learning neural network model). The model would be taught from examples to predict skeletal anatomy from whole-genome data from living birds and crocodilians. Basically, researchers would give the computer a bird genome as input and teach it to predict the body shape. Like other machine learning systems, it would strengthen the connections responsible for correct predictions and weaken the connections responsible for incorrect ones using backpropagation. This would require linking the variation in genetic sequences to measurable phenotypic differences in the bones (size, shape, proportions, articulation geometry).

But the model connecting genes to bones would need base knowledge before training. It wouldn’t work to start off by letting a chicken genome predict a entire chicken skeleton. That works for GPT because it only predicts the next word, but will never work with something so complex as a skeleton. The system needs strong priors so we would have to start with individual models for genes and skeletons and then connect them. The pretrained DNA foundation model would be trained on thousands of genomes across birds and reptiles to help it compress genomes into a meaningful embedding. Then a separate model would have to be pretrained on 3D skeletal shapes. The skeletal AI system would look at things like how limb bones scale across species, what structural features co-occur (femur angle with pelvic width), and general anatomical logic. This could involve a pretrained shape encoder that is trained on thousands of CT scans (or mesh landmarks) from birds and reptiles. Then these two models (genes and bones) would be aligned and trained as a joint model (e.g. contrastive learning, variational inference, diffusion bridge) to associate genome embeddings with morphology embeddings. This would be like teaching someone who already speaks fluent “genome-talk” and fluent “anatomy-talk” to become a translator between them instead of having to learn both languages from scratch at the same time. Merging the models could be done with relatively modest supervised training data, because the hard work of representation would have already taken place in pretraining.

This project would necessitate the genomes and naked skeletons of many reptiles and birds to build a library of examples (genotype to phenotype training corpus). This may necessitate genome and skeleton data from thousands of extant birds and crocodilians. There are over 10,000 species of reptiles and birds each, and this is great, because the more data the better. As of February 2022, genomes have been completed for over 540 species of birds, including at least one from every bird order. It is highly possible that other vertebrate (chordate) data, like fish, amphibian, and mammalian data, could strengthen the model. Either way, many animal’s genomes would have to be sequenced because numerous training examples would be needed. Distantly related species should be emphasized so the system can capture the diversity of forms. However, the system would also benefit from being exposed to intraspecies diversity, meaning that it would help to have numerous samples from the same species. The more, the better.

The genome is already a linear sequence of nucleotides ready for machine learning. The skeletons are 3D objects so they would have to be 3D mapped in a computer and broken down quantitatively into strings of numbers so that the AI can ingest them. This is addressed by the field of quantitative morphometrics and there are already large existing datasets. The morphological traits would be encoded as embeddings in a multidimensional embedding space where similar traits are located near each other.

Then, we would teach the machine to use the genome to predict what kind of skeleton it would make. A deep neural network or transformer-like architecture would learn statistical the mappings from sequence space to shape space. To do this, the AI model would be exposed to many pairs of genomes and skeletons so it can learn the mappings between them. It may sound strange to go from DNA to a skeleton but this is what these autoregressive systems do, they learn to predict one sequence from another. Contemporary systems have billions of parameters to tweak in order to memorize the intricate relationships between the input sequences and their corresponding output sequences. This forward model would have to be validated with data not used in training. For example, we could hold back the data from chickens in order to see how close to an actual chicken skeleton the system can get when given a chicken genome. If the system performs well, we would take this forward model and run it backwards.

Once the forward model is trained, we would invert it, applying it in reverse to predict plausible genomic sequences from input skeletal morphology. This is shown in the second step seen in the figure below. We would feed the inverted model skeletal data and train it to output genomes. This bi-directional genotype-phenotype modeling is similar to recent breakthroughs in protein design (AlphaFold → inverse design models like Chroma, RFdiffusion) where forward prediction unlocks plausible generative capacity in the inverse direction.

 


 

Once the model is adequately trained to turn skeletal data from living vertebrates into genetic sequences it could be fed morphometric data from a fossil dinosaur skeleton. This would necessitate a well-preserved and 3D mapped fossil dinosaur skeleton. After feeding this data into the inverse model, it would output a distribution of plausible genomic sequences consistent with bird-reptile sequence-morphology mappings (sampling from a probabilistic output space). If a bird’s genome explains its skeleton, then a dinosaur’s skeleton must have hidden information that can hint at its genome. You can see why this would not generate the true dinosaur genome. But it would generate a “best-fit plausible genome” grounded in comparative genomics and constrained by actual fossilized dino anatomy. This idea resembles recent AI breakthroughs in structure-function inference in proteins, but here is applied to deep-time vertebrate paleogenomics.

It is worth mentioning that it wouldn’t make sense to start by predicting bird genomes from bird skeletons, it must be done the other way around. The direction of causality plays a role, because genes cause bodies. Also, many different genotypes could result in similar skeletons and thus predicting genomes from skeletons requires learning a much more complex probability distribution. That is why starting by predicting skeletons from genomes creates a framework of internalized generative biological relationships which can be used to constrain guesses when running it backwards on fossils. It is like learning how to bake bread, you must first learn how ingredients combine to make bread before you can try to work backwards and guess what ingredients went into a loaf by looking at its shape and texture. This method may sound outlandish at first, but it’s actually exactly what these models do and are being trained to get better at: to exhaustively search for and learn all of the predictive patterns that can be found between an input sequence its output sequence.

By training a model on species where both the genome and skeleton are known, it becomes possible to map out a shared space where genetic traits and anatomical features co-vary in predictable ways. It also unearths latent traits, which are hidden patterns that connect DNA and anatomy. While it will not recover full genomes, it will generate constrained, probabilistic profiles, possibly offering the most scientifically grounded glimpse yet into the genomic architecture of extinct species. Of course, adding more information, aside from just skeletal anatomy, will help us reconstruct genomes with more precision. Let’s discuss this next.

4.0 What Other Information Might Assist In This Reconstruction?

We can glean a lot of pertinent information from fossils. Bone morphology and histology gives us size, shape, posture, musculature, joint angles, growth rate, vascularization, and stress markers. The bones also give us lots of internal geometry that relates to internal organs. Comparing fossils of animals of different ages provides information about the way the bones change over lifespan as well as growth curves and the hormones and chemicals that might underlie them. This all informs which developmental genes (e.g., FGF, BMP, Runx2) likely governed these features. Bones alone could reveal significant genotype-to-phenotype mappings.

But paleontologists have so much more on dinosaurs than just bones. There are also fossils of soft tissue impressions that show skin, feathers, and scales, providing copious information that would be necessary for approximating dino features. These impressions offer a wealth of information about the shape and composition of soft tissues that would have sat on top of the bones. Given enough of this information about the 3D structure of the exterior of the body, it could also be analyzed using the three-step pipeline above. In that case, it would be compared to the exterior of birds and reptiles. Just look at this highly preserved (practically mummified) impression of an ankylosaurus. There are also some findings of dinosaur internal organs and blood vessels. Data like this provides a wealth of anatomical information that could be utilized.

Borealopelta - Wikipedia

There are many other forms of fossils that could lend meaningful data. Eggs are common and their size, shape, and composition provide clues. Trace fossils like tracks and footprints add to the detail and give us information about the feet, the gait, and biomechanics that could be compared to that of birds. Brain endocasts (bony brain cases) are often uncovered in fossil specimens and they allow scientists to see the shape and size of the species’ brain. They also help paleoneurologists compare the proportional size of their neuroanatomical structures to the brains of crocodilians and birds. Even gastroliths (stomach stones) and coprolites (poop) offer mathematical and geometrical details. Scientists have also uncovered information regarding dinosaur melanocytes and pigmentation that offer strong clues about the colors of skin, scales, and feathers. Given that artist’s representations of dinosaurs (paleoart) has been improving in scientific rigor for over a century, an AI system could also utilize pictorial and video CGI reconstructions of dinosaurs as references.

Information about dinosaur behavior would also help and thankfully scientists have been developing this for over a century. Careful study has revealed very specific inferences about nesting, brooding, pack hunting, mating, sociality, intelligence, and much more. Fossil burrows, resting traces, feeding traces, and gnaw marks add to the resolution. There has also been lots of exacting scientific work on ontogenetic development, pathology, thermoregulatory physiology, isotopic signatures, paleobotany, and paleoenvironmental reconstruction. All this information helps us make assumptions about behavior, and comparing it to bird behavior allows us to constrain neurological and endocrine gene candidates. All taken togetherwe have an extraordinary amount of collateral evidence that can guide probabilistic reconstruction. The multimodal AI would not just “guess blindly” but would condition its genome drafts on a broad constellation of well-studied physical, ecological, physiological, and behavioral constraints.

Most of the present research on resurrecting dinosaurs involves using phylogenetic methods to look for conserved and divergent sequences from birds to reconstruct a plausible common ancestor. Partially reconstructing the common ancestor of all birds, which researchers are working on now, would be helpful but it wouldn’t never carry us to extinct dinos. Currently, scientists can and do make crude mathematical predictions about ancient DNA from living species. Scientists have also been able to predict some physical human traits like face or bone shapes from genes. However, I think the present method could take all this a lot farther. It seems that no one has proposed a system that could learn to predict skeleton shape based on DNA from modern birds and crocodiles, and then use that knowledge in reverse to predict dinosaur DNA from fossil skeletons. I actually asked GPT, Gemini, Claude, and Grok to search the web and see if anything like what I am laying out here has already been proposed and they each reported, after many combined minutes of search, that there is nothing like this on the internet and that it may be the best starting framework.

5.0 Birds and Reptiles Provide a Wealth of Genetic and Anatomical Data

Scientists use techniques such as comparative genomics and the molecular clock technique (among others) to map relatedness in vertebrates and this gives our AI a huge family tree to reference. Remember that birds are technically dinosaurs, so we do have dino DNA and this could go a long way toward predicting dinosaur genomes, especially certain kinds. Tyrannosaurus rex and velociraptors, like all birds, were members of theropoda, a group of bipedal, mostly carnivorous dinosaurs. In fact, birds and all theropods even belong to an even more specialized branch called Coelurosauria, which includes feathered dinosaurs and some of the most cognitively advanced species of the Mesozoic. Because birds are themselves coelurosaurian theropods, extinct lineages within Coelurosauria would be the most manageable for a bird-based proxy approach to resurrection. Of course dromaeosaurs like the raptors, as well as other paravian species (troodontids) would be closer (and easier to piece together) than a T. rex. Kind of cool for us because the dromaeosaurs like velociraptor and Utah raptor were likely the most intelligent and agile of all dinosaurs.

The last common ancestor of all birds lived around 100 million years ago in the late Cretaceous. Scientists have employed sophisticated bioinformatics to trace back how bird genomes evolved through deep time, attempting to piece together this genome. But the common ancestor that birds shared with the T. rex would have lived 160 million years ago in the Jurassic. This is an additional 60 million years so unfortunately, that leaves lots of time for genetic changes. Another helpful landmark, the bird-crocodilian split, happened around 250 million years ago in the Triassic. This was the common archosaur ancestor and scientists have used advanced comparative genomics techniques to reconstruct an estimated 50% of its genome at 91% accuracy. 

A diagram of different dinosaurs

AI-generated content may be incorrect.

 

But it is not just birds that can help in this regard. Birds and crocodilians are living archosaurs, a taxonomic group (clade) that dinosaurs are also nested inside. That means that crocs (alligators, caimans, gharials, and crocodiles) have much to offer as well. They offer an excellent contrast with the birds to help us interpolate about dinosaurs. Genomic comparison of bird vs. croc genomes identifies both deeply conserved elements and divergent innovations. Using genomes from birds, crocs, and other reptiles would help us reconstruct the gene order on chromosomes, predict coding genes present in dinosaurs, and estimate regulatory architecture. It definitely helps that we have access to thousands of living archosaur and even coelurosaur genomes to study and use as references.

 

Not a lizard nor a dinosaur, tuatara is the sole survivor of a  once-widespread reptile group

 

Data from lizards, snakes, and turtles will also contribute. Even reptiles like the tuatara offer further comparative opportunities given their genetic distance from other reptiles, their slow-evolving genome, and ancient reptilian traits. In fact, there are many interesting species, including monotremes (egg laying mammals), that could help constrain the baseline reptilian architecture upon which dinosaur genomic traits evolved.

There are also some fascinating birds that lend comparative details. Large flightless birds like the cassowary are about as close as modern birds get to true dinos (their feet look just like the feet of the dinosaurs in Jurassic Park). Two species of giant moas (Dinornis robustus and Dinornis novaezealandiae) might provide some profound insights. These towering birds, the only birds without even a vestige of a wing, are currently extinct although they were hunted as recently as 500 years ago. Over the last decade ancient-DNA labs have recovered both mitochondrial and sizable nuclear genomes from several moa species, including the two “giant moa” one of which was taller than 11 feet and over 600 pounds. Because moa evolved the largest body masses ever achieved by birds (up to 250 kg) independently of today’s ratites (ostriches, emus, cassowaries), their genomes are a natural experiment in avian gigantism. They give us a statistically powerful way to see which genetic routes birds can and cannot take when they achieve very large sizes. Moas would supply our dinosaur-inference pipeline with needed genotype-phenotype pairs at the extreme end of body size.

A graph with a number of species

AI-generated content may be incorrect.

 

Some scientists have been attempting to take living birds and change specific genetic features to create a throwback look. Jack Horner (the inspiration for Dr. Grant in the Jurassic Park series and technical advisor on the films) has a project to create a “chicken-o-saurus.” This project is aimed at creating a modified chicken that expresses dormant dinosaur-like traits. Dr. Horner wants to use gene-editing tools like CRISPR to counter the recent genes that made birds less like dinos. He envisions a chicken with teeth, a long tail, arms with clawed hands, and a rounded snout rather than a beak. I imagine we would want to remove feathers, the keel on the sternum, and the pygostyle (fused tail vertebrae). Searching for dormant genes in birds, in this way, could be a valid technique. Scientists inspired by Horner’s ideas went on to make a beak-less chicken with a snout that looks very dinosaur-like. To accomplish this, they found a cluster of genes related to facial development that exists in birds, but no other animals. They used an inhibitor to suppress these genes in embryonic chickens and, as you can see, the resulting bird faces appear much more like their distant dinosaur ancestors.

Chickenosaurus: How Genetically Engineered Theme Park Monsters Could Soon  Be A Thing | BEYONDbones

There are several other de-extinction projects underway now, but they all involve animals whose entire genome has been recovered from their remains. Furthermore, all these projects are not really true clones or resurrections. They all involve taking a related animal and changing a few genes (similar to the chicken-o-saurus concept). For example, Colossal Biosciences claim to be bringing animals such as the woolly mammoth, the thylacine, and dire wolves back from extinction. But in reality, they are taking the extinct genome as reference and then editing genetic sites in the closest living relatives to make a proxy animal with key traits. To “create” a woolly mammoth they are giving Asian elephants “cold resistant” attributes such as fur, increased fat, altered hemoglobin, and smaller ears. This is achieved by multiplex gene editing. To create the “dire wolves” they edited 20 sites across 14 genes in gray-wolf DNA using sequences inferred from ancient dire wolf remains and then cloned the edited cells Why don’t these companies just build an animal around the recovered genome? As the next section will explain, that is just too far beyond today’s technology.

6.0 Dinosaur Embryology

If we had a full, viable, synthetic Tyrannosaurus rex genome on a computer, bringing it to life would involve a complex series of technological steps. First it would have to go from zeros and ones on a computer, to the actual DNA polymer. The genetic code would have to be synthesized in segments using methods like Gibson assembly or yeast-based artificial chromosome construction (YAC). Currently something like this has been done for microbes but whole-genome synthesis has currently not been achieved for animals. The synthetic genome would be inserted into a host cell, like a de-nucleated bird ovum. This could be done via somatic cell nuclear transfer (SCNT) similar to the method used for Dolly the sheep. It is worth mentioning that even though several mammals have been cloned, scientist are still unable to clone a bird. The machinery of the cell that is hosting our T. rex genome must recognize it and properly express it proteins, helping it build a body. There must be no mismatches or incompatibilities with the cellular (cytoplasmic) environment or with host mitochondrial DNA. This ovum would then need to be implanted within a surrogate egg (even an ostrich egg is three to four times smaller than a T rex egg) or artificial womb. Incubation conditions would have to align precisely with the embryological development. Post-hatching the dinosaur would require intensive care, proper diet, temperature, humidity, and parental, brood, and social interaction. You can see how difficult this would be. Our technology is not there and not even close right now, but of course AI may change this rather rapidly. But let’s keep in mind that having vast information about dinosaurs genomes has value outside of de-extinction such as deepening our understanding of dinosaur biology and evolution.

7.0 What Other Animals Could Be Modeled Using this Framework?

Dino genes must be inferred because no truly intact, sequence-quality dinosaur DNA has ever been recovered and likely never will. The only genetic traces extracted from dino fossils are highly degraded molecules (possible chromosome fragments, collagen sequences, and chemical DNA markers) found inside exceptionally preserved dinosaur cartilage or bone. Even these findings are controversial and nowhere near the quality needed for genome sequencing or “de-extinction.” Experiments on ancient bones show DNA’s average bond half-life is about 521 years at 13 °C. Given this rate, statistical decay predicts all links would be destroyed after around 6.8 million years, even in perfect conditions. Unfortunately, non-avian dinosaurs went extinct 66 million years ago, ten times beyond that limit. In fact, retrieving an entire genome from the fossil remains of any species becomes very difficult after 100,000 years. After one million years, even if preserved by very cold or dry conditions, any DNA that is recovered will be fragmentary.

The fact that DNA can easily survive 10,000 years means that dodos, thylacines (marsupial tigers), woolly mammoths, and saber-toothed cats would not need the technique I am introducing here to be cloned and resurrected. But there are many interesting species aside from dinosaurs to which this technique would need to be applied. These include ancient animals such as trilobites, euryptids (sea scorpions), giant dragonflies, and ammonites; Mesozoic marine reptiles such as plesiosaurs, mosasaurs, and ichthyosaurs; Pleistocene megafauna such as woolly rhinoceros, cave lions, and giant ground sloths; early mammal-like reptiles (synapsids) such as pelycosaurs, and cynodonts, as well as recent human ancestors such as australopithecines, homo erectus, and homo heidelbergensis. It should even work on plants and fungi because we have many fossils of ancient plants. It is unclear if a technique like this could stretch back 500 million years ago to Cambrian animals like haikouichthys, anomalocaris, and hallucigenia in the absence of close modern relatives.

A diagram of dna sequence

AI-generated content may be incorrect.

It is worth mentioning that a process like the three-tiered pipeline described above could be used to approximate the faces of our hominin ancestors. Neanderthal, Denisovan, Homo Floresiensis and other skulls could be entered into an AI model after the model has been trained on human and ape skull / face pairings. This could be combined with other techniques to predict the facial features of ancient humans, another sight I have long thought lost to time before the advent of modern AI.

A diagram of a person's face

AI-generated content may be incorrect.

 

8.0 Weaknesses of This Approach

The method I have introduced here will not unearth the actual genomes, just make incredibly informed guesses about it. However, even an advanced AI system will not be able to reproduce sequences where evolution introduced significant novelties, lineage-specific adaptations, or regulatory rewiring that has no modern parallel. All of those actual genetic mutations and adaptations that dinosaurs made are lost to time. Furthermore, such a synthetic genome could result in a visually compelling likeness and an uncanny simulacrum of the artistic renderings but may largely fail to reproduce internal regulation. The process could result in animals that look like the dinosaurs in the movies but whose physiology and even behavior is closer to birds or crocs. This would risk creating a "chimeric reconstruction" rather than a resurrection.

Chromosome number confuse things. Some birds have 40 chromosomes, other have over 140 and it is anyone’s guess how many T rex had. Regulatory sequences (e.g., promoters, enhancers, and silencers) control when, where, and how much genes are expressed. They are often species-specific and evolve rapidly. A T. rex might have had unique enhancers for muscle growth or bone density that no longer exist in its living relatives. Epigenetic modifications (DNA methylation and histone modification) also influence gene activity but without altering the DNA sequence. These marks decompose with the genes so AI would have to hypothesize working epigenetic profiles based on modern analogs, furthering uncertainty. Non-coding DNA (that does not code for proteins, e.g., introns, regulatory elements, and supposed junk DNA) also poses an issue. It comprises 98-99% of the genome, evolves differently from coding regions, often contains lineage-specific adaptations, and lacks clear genotype-phenotype correlations. The present skeletal morphology technique could capture coding genes linked to bone structure, but non-coding DNA’s role in overall genome stability, gene regulation, and phenotypic variation would remain unaddressed.

It is also important to mention that there are serious ethical and ecological concerns at play here outside the scope of this entry. For instance, humans artificially selected pug dogs to have a collapsed snout, this made it difficult for them to breathe. There are many examples of domestication creating disease states and it is clear that an engineered dinosaur could be born into an uncomfortable, painful, or diseased body. Hollywood has already pointed out many of the ethical quandaries of de-extinction including animal cruelty, human safety, and invasive ecological concerns.  

Currently, you cannot fit an entire vertebrate genome into the attentional window of an AI. This means that it cannot take the entire genome into account when making predictions and that some long-range dependencies may not be recognized. Bird genomes are around 1 billion base pairs (1.0 to 1.3 gigabase pairs (Gb)) and crocodile genomes contain around 2 to 3 billion. Chat GPT can only hold about 128,000 tokens at a time and Google Gemini can hold around around one million. That means that the attentional window needs to be over 1,000 times bigger. The industry has seen attention doubling every 18 to 24 months and at this rate it would be around 10 years before a transformer’s window of attention can encompass all of the DNA in question. Of course, there are many ways to get around this, even today, (preprocess the genome into embeddings, prioritize known relevant loci, and use hierarchical architectures) but this is just one example of the fact that as technology grows this idea becomes more feasible.

The jump from predicting skeletal morphology to generating functional genomes capable of producing viable organisms is a significant one. The mappings are highly nonlinear, and dependent on environmental context. Moreover, a dinosaur skeleton could produce multiple plausible genomes, there is no way to actually test to see if any are accurate, and validating which could be biologically viable could be very difficult. One of the most sobering hard truths about this enterprise is the difficulties inherent in embryology. To progress from a zygote, to an embryo, to a fetus, to a healthy young animal the genetic blueprint must be incredibly internally consistent. This is easy for nature to accomplish, but just having an AI dream up (or worse hallucinate) an animal genome gives little reassurance that there will not be structural inconsistencies due to the tremendous complexity of gene interactions. Everything must work together, and work just right to avoid developmental failure. Of course, this is a problem that a far-future superintelligence could solve, but it won’t be solved using the method outlined in my three-step pipeline outlined above.

But this pipeline may be more useful than it may at first seem. The three-step methodology outlined here is not just suited for biology. It could be used as a generalizable framework for cross-domain translation: where one set of observable features (like morphology) is used to infer a related, but unobservable set (like genetics), via a latent, learned manifold. In fact, this method (train forward, invert, apply to unknowns) is a flexible blueprint for abductive reasoning via deep representation learning, and it could have enormous potential beyond biology. You would use it in many cases when you have A, B, and C, but not D. And A is to B as C is to D. We could call it bidirectional manifold mapping for latent inference. It learns to model a latent manifold that encodes the “grammar” of a domain. Once that space is shaped well enough, inverting across it becomes a powerful general inference engine.

9.0 Conclusion

One of the fondest memories I have is of reading Michael Crichton’s Jurassic Park in third grade, before the first movie came out, and marveling at the idea of extracting dino DNA from a mosquito trapped in amber. Today, we know it is impossible, but it sure felt elegant at the time. Even without DNA recovery, AI promises a new kind of “virtual paleogenetics,” a way to infer and simulate the genomes and physiologies of extinct organisms using bioinformatics, sophisticated prediction and comparative analysis. This is due in part to another scientific concept that enraptured me as a child (and the original Jurassic Park novel got right), that birds are dinosaurs.

Despite the major hurdles, the method outlined here could be a good starting point for several reasons. Creating the first genome-to-skeleton bidirectional models would also produce incremental value at each step, advancing the understanding of genotype-phenotype interactions. Even without resurrection, these AI-generated dino genomes could allow synthetic cell lines for studying dinosaur biochemistry in vitro, organoid models, or simulations of growth curves, muscle structure, and thermoregulation.

The present method would only produce a synthetic “consensus dinosaur genome,” not a true historical sequence. It would be educated guesswork, but firmly grounded in real data. It could even result in living and breathing organisms that look how we expect dinosaurs to look, although they may not function or act like the real thing. I believe that creating a Jurassic Park with cloned approximations will be possible when methods like those discussed here are in the hands of superintelligent AI agents. In fact, I wouldn’t be surprised if this fantasy was brought to life within our lifetimes.

 


Note:  I started writing this blog after having a strong sense that artificial superintelligence could achieve dinosaur resurrection and I wanted to provide a description of that vision. But after posting it, I knew something was missing. So I sat on the couch for 10 minutes and racked my brain, telling myself over and over that there was something important I was missing. And then somehow the idea for the three step pipeline basically entered my mind fully formed. It felt like the idea just materialized, possibly from unconscious incubation, because there was next to zero reasoning involved .It’s not going to be easy to implement, but I do think it could reach and harness key latent information hidden in dinosaur fossils.

 

 

 

No comments:

Post a Comment