You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: logs/run_1/README.md
+18-10Lines changed: 18 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,15 @@
6
6
**Total Duration:**~58 Hours
7
7
8
8
## Scientific Summary
9
-
This run represents the alpha test of the Neuromodulatory Control Network (NCN) architecture. The model was trained for **1 epoch** on the **TinyStories** dataset (mapped to binary format) to verify the hypothesis that neuromodulatory gating can improve sample efficiency on narrative data.
9
+
This run serves as the alpha validation of the **Neuromodulatory Control Network (NCN)** architecture on the TinyStoriesbinary dataset.
10
10
11
-
The model achieved a final validation perplexity of **4.5184**. This is a significant result for an 18M parameter model trained for only one pass over the data, suggesting the NCN hypernetwork effectively regulates the plasticity of the main transformer backbone.
11
+
Unlike standard Transformer training, this experiment tests the hypothesis that a parallel hypernetwork can **implicitly learn an optimal processing strategy** (Section 2.1 of the paper) by modulating the main network's gain, precision, and gating dynamics. The goal was to observe if the NCN could stabilize without "Entropy Shock" and achieve competitive perplexity through dynamic resource allocation rather than static weight optimization.
12
+
13
+
## Theoretical Hypotheses Tested
14
+
This run specifically targets three biological mechanisms proposed in the NCN paper:
15
+
1.**Thermodynamic Regulation (Exploration vs. Exploitation):** Can the `precision` signal ($\beta$) dynamically regulate the entropy of the attention mechanism, mimicking the signal-to-noise ratio modulation of Norepinephrine?
16
+
2.**Gradient Shielding:** Does the multiplicative `gain` ($g$) allow the model to selectively down-regulate layers during specific contexts, theoretically shielding specialized weights from catastrophic interference (Plasticity-Stability Dilemma)?
17
+
3.**Metabolic Efficiency:** Verifying if **Homeostatic Regularization** ($\mathcal{L}_{reg}$) prevents the control manifold from collapsing into a rigid state or exploding.
12
18
13
19
## Final Metrics
14
20
| Metric | Value |
@@ -26,29 +32,31 @@ The model achieved a final validation perplexity of **4.5184**. This is a signif
***Initialization:** Bias Initialization Strategy (Section 4.1.4 of paper) applied to prevent "Metabolic Throttling."
56
+
57
+
## Training Dynamics & Observations
58
+
The log confirms the efficacy of the **Bias Initialization Strategy** described in Section 4.1.4. The model avoided the "Entropy Shock" typical of hypernetworks; the loss curve shows immediate, stable descent from step 0.
50
59
51
-
## Training Dynamics
52
-
The training was stable with no loss spikes. The `grad_clip` of 1.0 was rarely triggered after the warmup phase. The NCN parameters introduced a computational overhead of roughly <2% compared to a vanilla forward pass.
60
+
The validation perplexity of **4.51** on a small-scale model (18M) suggests that the NCN is successfully compressing the loss manifold by dynamically altering the effective depth and sharpness of the network per token, rather than treating all tokens with uniform computational intensity.
53
61
54
62
**Log file:**`training.log` (Attached in this directory)
0 commit comments