Falcon H1 MLP block uses gate and down proj multipliers. Currently this is unsupported for Liger Kernels.
Can we add support of Swiglu MLP Liger kernel with gate and down proj multipliers ?
class FalconH1MLP(nn.Module):
def __init__(self, config: FalconH1Config):
super().__init__()
self.config = config
self.hidden_size = config.hidden_size
self.intermediate_size = config.intermediate_size
self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=config.mlp_bias)
self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=config.mlp_bias)
self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=config.mlp_bias)
self.act_fn = ACT2FN[config.hidden_act]
self.gate_multiplier, self.down_multiplier = config.mlp_multipliers
def forward(self, x):
y = self.up_proj(x) * self.act_fn(self.gate_proj(x) * self.gate_multiplier)
y = self.down_proj(y) * self.down_multiplier
return y
🚀 The feature, motivation and pitch
Falcon H1 MLP block uses gate and down proj multipliers. Currently this is unsupported for Liger Kernels.
Can we add support of Swiglu MLP Liger kernel with gate and down proj multipliers ?
Alternatives
No response
Additional context
No response