I'm terribly sorry. 😧

#1
by MrDevolver - opened

Hello,

I'd like to apologize for my previous comment regarding this model even though the original thread I made with that post is no longer there.

I cannot recall the exact words that I used there, but the point of that post was to inform that the GGUF version of the model is broken.
In the meantime, I've discovered that the inference runtimes I've been using lately have BROKEN support for MoE models in general which would most likely affect the performance of this model as well. I haven't tested it with the last working version of the inference engine yet, however I believe that it would be fair to mention that here. If nothing else, at least users of LlamaCpp based applications such as LM Studio will be aware of the possibility that what appears to be a broken MoE model may actually be a broken MoE model support at inference engine level!

To anyone it may concern, I'm using LM Studio Beta, Vulkan LlamaCpp Runtime and the last version in which MoE models seem to be working was v1.65.0.

@win10 I'm sorry for any trouble my previous thread may have caused.

Owner

Hello,

I'd like to apologize for my previous comment regarding this model even though the original thread I made with that post is no longer there.

I cannot recall the exact words that I used there, but the point of that post was to inform that the GGUF version of the model is broken.
In the meantime, I've discovered that the inference runtimes I've been using lately have BROKEN support for MoE models in general which would most likely affect the performance of this model as well. I haven't tested it with the last working version of the inference engine yet, however I believe that it would be fair to mention that here. If nothing else, at least users of LlamaCpp based applications such as LM Studio will be aware of the possibility that what appears to be a broken MoE model may actually be a broken MoE model support at inference engine level!

To anyone it may concern, I'm using LM Studio Beta, Vulkan LlamaCpp Runtime and the last version in which MoE models seem to be working was v1.65.0.

@win10 I'm sorry for any trouble my previous thread may have caused.

Do you find this model useful?

Do you find this model useful?

I was definitely looking forward to it, because the MiniMax v2.1 itself is too big for my current hardware, so any opportunity to have it at least in form of a smaller distilled model was something I couldn't pass on.

Unfortunately, like I mentioned, the last time I tried it (the original version), I wasn't able to use it properly without errors. However, I suspect the broken output was due to broken MoE support in the inference engine itself rather than a model issue. Today I switched back to the last working inference runtime version which generates proper output with MoE models and right now I'm re-downloading the model from mradermacher.

I believe mradermacher team still has not updated their repository with the current version of this model, but I guess for testing purposes it should suffice for now.

Hopefully it will work better than the last time I tried it.

In any case, I also used GGUF My Repo space to produce standard Q4_K_M quant file of the current version of the model, so I will definitely try that one too and I will update you with any news.

Update:
So the original version of the model which is currently available in mradermacher's repository no longer outputs random letters and words and the output seems properly formatted and more coherent.
Unfortunately, it seems confused. Like it completely seemed to misinterpret my prompts in its CoT. What is more interesting though, it seemed to speculate about whether the policy allows such requests or not which I think shouldn't be happening since it was based on a version of the model which has been abliterated by Heretic method?

I still haven't tried the current version of the model (I am downloading it right now), but I guess you had reasons to create this new version, so hopefully it will work better.

Update II:
I was unable to quantize the current version of the model into the preferred MXFP4 MoE format and the closest to that size seemed to be Q3_K_S which I did try in the end, but unfortunately I observed similar behavior like with the original version of the model. The AI seemed to misinterpret my prompts.

Did you test the model? What kind of parameter settings would be the most preferred ones? Maybe I'm setting something wrong (right now, I'm using the same parameters as for standard GPT-OSS 20B model).

Owner

Update:
So the original version of the model which is currently available in mradermacher's repository no longer outputs random letters and words and the output seems properly formatted and more coherent.
Unfortunately, it seems confused. Like it completely seemed to misinterpret my prompts in its CoT. What is more interesting though, it seemed to speculate about whether the policy allows such requests or not which I think shouldn't be happening since it was based on a version of the model which has been abliterated by Heretic method?

I still haven't tried the current version of the model (I am downloading it right now), but I guess you had reasons to create this new version, so hopefully it will work better.

Update II:
I was unable to quantize the current version of the model into the preferred MXFP4 MoE format and the closest to that size seemed to be Q3_K_S which I did try in the end, but unfortunately I observed similar behavior like with the original version of the model. The AI seemed to misinterpret my prompts.

Did you test the model? What kind of parameter settings would be the most preferred ones? Maybe I'm setting something wrong (right now, I'm using the same parameters as for standard GPT-OSS 20B model).

Interestingly, distillation may have disrupted the heretic process.

Sign up or log in to comment