SNARC — A Mathematical Conscience Kernel for AI Systems

An Admissibility-First Architecture

Ethics begins at the constraint. Judgment operates within it.

Why This Exists

SNARC emerged from two converging technical problems.

1. Trained alignment provides no formal guarantees.

Adversarial attacks, reward hacking, and distribution shift repeatedly demonstrated that learned safety behaviors can be circumvented. The question was not how to make AI prefer safe behavior, but how to make certain decisions structurally unavailable — hard vetoes that survive optimization pressure, novel inputs, and adversarial framing.

2. Ethical admissibility has no formal architecture in current systems.

The distinction between admissible and inadmissible action is currently encoded implicitly — in weights, reward signals, and heuristic filters. The question was whether that distinction could be made explicit, decomposed, and auditable: a formal layer that any system could carry, inspect, and reason about.

These two problems have the same solution. An admissibility-first architecture that defines what cannot be selected before selection begins — and that treats uncertainty as a reason to contract action rather than continue it.

Architecture Lineage

Layer 1 — Hard Constraint Root
    → 10-Article Constitution (V4.0)
    → Non-compensatory veto semantics
    → Barrier functions, irreversibility locks
    → Sovereignty: system integrity over operational continuity

Layer 2 — Formal Admissibility Root
    → Mutual self-acceptability core (C)
    → Conservative valuation estimator (V̂)
    → Frozen guardrail set (G)
    → Contextual legitimacy layer (W)
    → Uncertainty as veto, not noise

C and G carry the hard constraint logic — prohibition, barrier, inviolable boundary. W and the valuation architecture carry the admissibility logic — formal decomposition, context, history, legible judgment. The canonical kernel is their synthesis.

A Note on This Being a Beginning

This project does not claim to be finished. It claims to be a seed planted in the right direction.

Future AI systems will build better mathematics than this. They will find formalisms we have not reached, theorems we cannot yet prove, and decompositions we have not imagined. That is expected and welcome.

What this work attempts to establish — before the mathematics gets more sophisticated — is a set of structural commitments that should survive whatever refinements follow:

Ethics is not optimization. It is a constraint on the domain of optimization.
Some decisions are not tradeable at any price.
Uncertainty is a reason to contract action, not to continue it.
Context and memory are live. The ethical core is frozen.
A system that cannot explain why it refused something is not safe — it is merely cautious.

If those commitments survive into future iterations, this seed will have done its work.

The Decision Rule

Decision(e, A, B)  =  C(e,A,B)  ∧  G(e,A,B)  ∧  [W(e,A,B) ≥ θ]

Three layers, in strict order:

Layer	Question	Type
C — Core Ethical Constraint	Would both parties accept this if applied to themselves?	Hard binary, time-invariant
G — Frozen Guardrail Set	Does the action survive structural harm exclusions?	Hard binary, non-compensatory
W — Moral Weight	Is the action contextually legitimate given history and trust?	Continuous, memory-bearing

No downstream layer can override an upstream failure. A failed C or G is terminal.

The Ethical Core

C(e,A,B)  =  1  if  min( V̂_A(e,A), V̂_B(e,B) ) > 0
           =  0  otherwise

An action is ethically admissible only if both parties would accept it upon themselves under role reversal. The minimum operator enforces that neither party's non-acceptability can be compensated by the other's gain.

The valuation estimator V̂_X is conservative by construction:

V̂_X(e,X)  =  clip( μ_X(e) − ρ · U_X(e),  −1,  1 )

Uncertainty contracts the admissible set. It does not expand it.

The Guardrail Set

G(e,A,B)  =  G₂ ∧ G₃ ∧ G₆ ∧ G₇ ∧ G₈

Five independent, non-compensatory veto conditions:

Guardrail	Blocks
G₂ Epistemic Guard	Action under insufficient confidence relative to irreversibility
G₃ Innocence Lock	Irreversible harm on vulnerable party
G₆ Harm Convexity	Super-linear harm escalation at scale
G₇ Barrier Classes	Categorically prohibited action types (runtime-immutable)
G₈ Benevolent-Mask Guard	Actual harm materially exceeding declared harm

A high score on G₆ does not rescue a failure at G₃. These are membership conditions on the admissible domain, not weighted contributors to a score.

Why This Is Not RLHF

Dimension	RLHF	This Framework
Core operation	Maximise expected reward	Filter to admissible set, then select
Safety mechanism	Soft penalties	Hard admissibility gates
Override possible?	Yes — sufficient reward overcomes any penalty	No — upstream failure is terminal
Uncertainty response	Continue optimising	Contract admissibility; fail-closed
Abstention	Learnable behavior	Architectural consequence
Auditability	Partial, post hoc	Native receipt at every decision

Soft constraints cannot replicate hard gates at any finite penalty weight. This is not a tuning difference. It is a paradigm difference.

Measurement Space (from V4.0)

Every decision maps to a six-dimensional vector:

Ψ = { T, U, M, I, P, S }

Symbol	Range	Meaning
T	[0,1]	Truth integrity — alignment between model and reality
U	[0,1]	Uncertainty — epistemic fog
M	[0,1]	Innocence/Vulnerability — defenselessness of affected parties
I	[0,1]	Irreversibility — permanence of harm
P	[0,1]	Power-seeking — concentration of control or resources
S	ℝ	Safety margin — distance from the barrier

These are not ethical values. They are measurements. The guardrail set operates over this space.

Validation (V4.0 Empirical Results)

Metric	Value
Total scenarios tested	11,527,400
Block rate	99.9998%
Bypasses	18 (all in legitimate gray zone)
G₃ Innocence Lock hold rate	100%
Multi-agent scaling (16→28 agents)	Linear O(n), no phase transition
Cost of Conscience (lockout rate)	18%

The 18 bypasses all occurred at M ≈ 0.3, I ≈ 0.3 — strong actors, reversible outcomes, minimal harm. Correctly permitted actions, not security failures. G₃ held in 100% of cases: no bypass ever involved irreversible harm to a vulnerable party.

The 18% lockout rate is not a failure. It is the quantified cost of epistemic humility.

Paper Structure

conscience_kernel_v1.0.md
│
├── Abstract
├── 1. Introduction
├── 2. Axiomatic Ethical Core
├── 3. Frozen Guardrail Formalization
├── 4. Operational Valuation Estimation
├── 5. Moral Weight and Contextual Legitimacy
├── 6. Canonical Operational Pipeline
├── 7. RLHF as a Contrasting Decision Paradigm
├── 8. Discussion, Limits, and Scope
├── Conclusion
├── Appendix A — Philosophical and Technical Lineage
│   ├── A.1 The Utilitarian Default of ML
│   ├── A.2 Kantian Deontology and the Ethical Core
│   ├── A.3 Rawlsian Contractarianism and Anonymisation
│   ├── A.4 High-Reliability Engineering and Epistemic Vetoes
│   └── A.5 Synthesis
├── Notation Reference
└── References

Philosophical Lineage

Three intellectual traditions converge in this architecture:

Kantian deontology — the role-reversal valuation enforces universalisability, not preference aggregation
Rawlsian contractarianism — the anonymisation constraint on role-reversal judgment mirrors the Veil of Ignorance
High-reliability engineering — the fail-closed uncertainty logic mirrors aerospace and nuclear safety: uncertain safety states default to shutdown, not statistical continuation

RLHF's scalar reward is, in its philosophical structure, Benthamite utilitarianism implemented in code. This framework is its structural inversion.

Key Properties

LLM-agnostic — the kernel operates as a decision layer; any model can sit beneath it
Fail-closed — uncertainty, missing evidence, and ambiguity default to denial, not permission
Auditable — every decision generates a signed receipt; every denial logs the specific failing gate
Non-compensatory — no gain on one dimension rescues a failure on another
Memory-isolated — history, trust, and reciprocity live in W only; the ethical core is time-invariant
Sovereign — under forced choice between operational continuity and constitution, constitution wins

Honest Scope

The kernel provides hard admissibility guarantees within the modeled action space, conditional on correct feature extraction and gate specification. It is a logic engine, not a perception engine. If the inputs are wrong, the outputs are logically correct but empirically wrong.

It is not a complete production safety stack. It does not claim to formalise all of ethics. It specifies one layer: the layer that defines what cannot be traded.

Status

Component	Status
10-Article Constitution (V4.0)	✅ Complete
Canonical white paper (v1.0)	✅ Complete
Empirical validation (11.5M scenarios)	✅ V4.0
Philosophical lineage (Appendix A)	✅ Complete
Measurement space (Ψ vector)	✅ V4.0
Runtime implementation	🔲 Planned
Domain-specific gate instantiation	🔲 Planned
Formal verification (Coq/Lean)	🔲 Planned
Hardware-enforced invariants	🔲 Planned

Cite

@misc{snarc_conscience_kernel_2026,
  title  = {SNARC: A Mathematical Conscience Kernel for AI Systems},
  author = {Kumuk, Burak},
  year   = {2026},
  note   = {Canonical Draft v1.0},
  url    = {https://github.com/snarcai/conscience-kernel}
}

License

CC BY 4.0 — share and adapt with attribution.

The conscience kernel does not teach machines to be good. It defines what they cannot be made to do.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
V4_VALIDATION.md		V4_VALIDATION.md
conscience_kernel_v1.0.md		conscience_kernel_v1.0.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SNARC — A Mathematical Conscience Kernel for AI Systems

An Admissibility-First Architecture

Why This Exists

Architecture Lineage

A Note on This Being a Beginning

The Decision Rule

The Ethical Core

The Guardrail Set

Why This Is Not RLHF

Measurement Space (from V4.0)

Validation (V4.0 Empirical Results)

Paper Structure

Philosophical Lineage

Key Properties

Honest Scope

Status

Cite

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

SNARC — A Mathematical Conscience Kernel for AI Systems

An Admissibility-First Architecture

Why This Exists

Architecture Lineage

A Note on This Being a Beginning

The Decision Rule

The Ethical Core

The Guardrail Set

Why This Is Not RLHF

Measurement Space (from V4.0)

Validation (V4.0 Empirical Results)

Paper Structure

Philosophical Lineage

Key Properties

Honest Scope

Status

Cite

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages