Synthetic data, by construction.
100% synthetic. Not because of policy. Because of architecture. The trust posture that follows when the platform genuinely does not require scholarly correspondence to be trained.
Every AI vendor today says some version of "we protect your data." The sentence is necessary, expected, and almost meaningless. What actually matters is whether the architecture forces the vendor to protect the data, or whether the protection is policy on top of an architecture that does not require it.
Eve-Genesis is the second kind. The synthetic-by-construction posture is not a policy we adopted; it is a property of how the dataset is built. The training set is generated synthetically. The reasoner is LoRA fine-tuned on the synthetic corpus. Scholarly correspondence does not enter the training set because there is no architectural slot for it to enter.
Four commitments that follow
One. 100% synthetic. No user conversation enters the training set. Not because of policy; because the architecture genuinely does not require it.
Two. The IPSC corpus is the grounding source for answers, not the training source for the reasoner. The corpus is read at inference time; the model that reads it was trained synthetically.
Three. Versioned and provenance-traceable. Every Eve-Genesis edition is versioned. Every record traces to a generation pass with the reasoning structure documented at authoring time.
Four. Frontier-independent. Frontier models are commodity consultants in the architecture. The Uṣūl reasoning IP is ours. As the frontier moves, the platform appreciates rather than depreciates.
What this means for the scholarly community
A scholar using TheoAI to research a question sends queries that are not used to train the reasoner. The corpus is what the reasoner reads; not what the reasoner is trained on. That separation is the trust posture, and it is architectural.