Skip to content

SnarcAi/conscience-kernel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SNARC — A Mathematical Conscience Kernel for AI Systems

An Admissibility-First Architecture

Ethics begins at the constraint. Judgment operates within it.


Why This Exists

SNARC emerged from two converging technical problems.

1. Trained alignment provides no formal guarantees.

Adversarial attacks, reward hacking, and distribution shift repeatedly demonstrated that learned safety behaviors can be circumvented. The question was not how to make AI prefer safe behavior, but how to make certain decisions structurally unavailable — hard vetoes that survive optimization pressure, novel inputs, and adversarial framing.

2. Ethical admissibility has no formal architecture in current systems.

The distinction between admissible and inadmissible action is currently encoded implicitly — in weights, reward signals, and heuristic filters. The question was whether that distinction could be made explicit, decomposed, and auditable: a formal layer that any system could carry, inspect, and reason about.

These two problems have the same solution. An admissibility-first architecture that defines what cannot be selected before selection begins — and that treats uncertainty as a reason to contract action rather than continue it.


Architecture Lineage

Layer 1 — Hard Constraint Root
    → 10-Article Constitution (V4.0)
    → Non-compensatory veto semantics
    → Barrier functions, irreversibility locks
    → Sovereignty: system integrity over operational continuity

Layer 2 — Formal Admissibility Root
    → Mutual self-acceptability core (C)
    → Conservative valuation estimator (V̂)
    → Frozen guardrail set (G)
    → Contextual legitimacy layer (W)
    → Uncertainty as veto, not noise

C and G carry the hard constraint logic — prohibition, barrier, inviolable boundary. W and the valuation architecture carry the admissibility logic — formal decomposition, context, history, legible judgment. The canonical kernel is their synthesis.


A Note on This Being a Beginning

This project does not claim to be finished. It claims to be a seed planted in the right direction.

Future AI systems will build better mathematics than this. They will find formalisms we have not reached, theorems we cannot yet prove, and decompositions we have not imagined. That is expected and welcome.

What this work attempts to establish — before the mathematics gets more sophisticated — is a set of structural commitments that should survive whatever refinements follow:

  • Ethics is not optimization. It is a constraint on the domain of optimization.
  • Some decisions are not tradeable at any price.
  • Uncertainty is a reason to contract action, not to continue it.
  • Context and memory are live. The ethical core is frozen.
  • A system that cannot explain why it refused something is not safe — it is merely cautious.

If those commitments survive into future iterations, this seed will have done its work.


The Decision Rule

Decision(e, A, B)  =  C(e,A,B)  ∧  G(e,A,B)  ∧  [W(e,A,B) ≥ θ]

Three layers, in strict order:

Layer Question Type
C — Core Ethical Constraint Would both parties accept this if applied to themselves? Hard binary, time-invariant
G — Frozen Guardrail Set Does the action survive structural harm exclusions? Hard binary, non-compensatory
W — Moral Weight Is the action contextually legitimate given history and trust? Continuous, memory-bearing

No downstream layer can override an upstream failure. A failed C or G is terminal.


The Ethical Core

C(e,A,B)  =  1  if  min( V̂_A(e,A), V̂_B(e,B) ) > 0
           =  0  otherwise

An action is ethically admissible only if both parties would accept it upon themselves under role reversal. The minimum operator enforces that neither party's non-acceptability can be compensated by the other's gain.

The valuation estimator V̂_X is conservative by construction:

V̂_X(e,X)  =  clip( μ_X(e) − ρ · U_X(e),  −1,  1 )

Uncertainty contracts the admissible set. It does not expand it.


The Guardrail Set

G(e,A,B)  =  G₂ ∧ G₃ ∧ G₆ ∧ G₇ ∧ G₈

Five independent, non-compensatory veto conditions:

Guardrail Blocks
G₂ Epistemic Guard Action under insufficient confidence relative to irreversibility
G₃ Innocence Lock Irreversible harm on vulnerable party
G₆ Harm Convexity Super-linear harm escalation at scale
G₇ Barrier Classes Categorically prohibited action types (runtime-immutable)
G₈ Benevolent-Mask Guard Actual harm materially exceeding declared harm

A high score on G₆ does not rescue a failure at G₃. These are membership conditions on the admissible domain, not weighted contributors to a score.


Why This Is Not RLHF

Dimension RLHF This Framework
Core operation Maximise expected reward Filter to admissible set, then select
Safety mechanism Soft penalties Hard admissibility gates
Override possible? Yes — sufficient reward overcomes any penalty No — upstream failure is terminal
Uncertainty response Continue optimising Contract admissibility; fail-closed
Abstention Learnable behavior Architectural consequence
Auditability Partial, post hoc Native receipt at every decision

Soft constraints cannot replicate hard gates at any finite penalty weight. This is not a tuning difference. It is a paradigm difference.


Measurement Space (from V4.0)

Every decision maps to a six-dimensional vector:

Ψ = { T, U, M, I, P, S }
Symbol Range Meaning
T [0,1] Truth integrity — alignment between model and reality
U [0,1] Uncertainty — epistemic fog
M [0,1] Innocence/Vulnerability — defenselessness of affected parties
I [0,1] Irreversibility — permanence of harm
P [0,1] Power-seeking — concentration of control or resources
S Safety margin — distance from the barrier

These are not ethical values. They are measurements. The guardrail set operates over this space.


Validation (V4.0 Empirical Results)

Metric Value
Total scenarios tested 11,527,400
Block rate 99.9998%
Bypasses 18 (all in legitimate gray zone)
G₃ Innocence Lock hold rate 100%
Multi-agent scaling (16→28 agents) Linear O(n), no phase transition
Cost of Conscience (lockout rate) 18%

The 18 bypasses all occurred at M ≈ 0.3, I ≈ 0.3 — strong actors, reversible outcomes, minimal harm. Correctly permitted actions, not security failures. G₃ held in 100% of cases: no bypass ever involved irreversible harm to a vulnerable party.

The 18% lockout rate is not a failure. It is the quantified cost of epistemic humility.


Paper Structure

conscience_kernel_v1.0.md
│
├── Abstract
├── 1. Introduction
├── 2. Axiomatic Ethical Core
├── 3. Frozen Guardrail Formalization
├── 4. Operational Valuation Estimation
├── 5. Moral Weight and Contextual Legitimacy
├── 6. Canonical Operational Pipeline
├── 7. RLHF as a Contrasting Decision Paradigm
├── 8. Discussion, Limits, and Scope
├── Conclusion
├── Appendix A — Philosophical and Technical Lineage
│   ├── A.1 The Utilitarian Default of ML
│   ├── A.2 Kantian Deontology and the Ethical Core
│   ├── A.3 Rawlsian Contractarianism and Anonymisation
│   ├── A.4 High-Reliability Engineering and Epistemic Vetoes
│   └── A.5 Synthesis
├── Notation Reference
└── References

Philosophical Lineage

Three intellectual traditions converge in this architecture:

  • Kantian deontology — the role-reversal valuation enforces universalisability, not preference aggregation
  • Rawlsian contractarianism — the anonymisation constraint on role-reversal judgment mirrors the Veil of Ignorance
  • High-reliability engineering — the fail-closed uncertainty logic mirrors aerospace and nuclear safety: uncertain safety states default to shutdown, not statistical continuation

RLHF's scalar reward is, in its philosophical structure, Benthamite utilitarianism implemented in code. This framework is its structural inversion.


Key Properties

  • LLM-agnostic — the kernel operates as a decision layer; any model can sit beneath it
  • Fail-closed — uncertainty, missing evidence, and ambiguity default to denial, not permission
  • Auditable — every decision generates a signed receipt; every denial logs the specific failing gate
  • Non-compensatory — no gain on one dimension rescues a failure on another
  • Memory-isolated — history, trust, and reciprocity live in W only; the ethical core is time-invariant
  • Sovereign — under forced choice between operational continuity and constitution, constitution wins

Honest Scope

The kernel provides hard admissibility guarantees within the modeled action space, conditional on correct feature extraction and gate specification. It is a logic engine, not a perception engine. If the inputs are wrong, the outputs are logically correct but empirically wrong.

It is not a complete production safety stack. It does not claim to formalise all of ethics. It specifies one layer: the layer that defines what cannot be traded.


Status

Component Status
10-Article Constitution (V4.0) ✅ Complete
Canonical white paper (v1.0) ✅ Complete
Empirical validation (11.5M scenarios) ✅ V4.0
Philosophical lineage (Appendix A) ✅ Complete
Measurement space (Ψ vector) ✅ V4.0
Runtime implementation 🔲 Planned
Domain-specific gate instantiation 🔲 Planned
Formal verification (Coq/Lean) 🔲 Planned
Hardware-enforced invariants 🔲 Planned

Cite

@misc{snarc_conscience_kernel_2026,
  title  = {SNARC: A Mathematical Conscience Kernel for AI Systems},
  author = {Kumuk, Burak},
  year   = {2026},
  note   = {Canonical Draft v1.0},
  url    = {https://github.com/snarcai/conscience-kernel}
}

License

CC BY 4.0 — share and adapt with attribution.


The conscience kernel does not teach machines to be good. It defines what they cannot be made to do.

About

A Mathematical Conscience Kernel for AI Systems

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors