AI & ML interests

Breaking the opacity of language models for legal professionals 📖 Join us by smashing the button at top right 🤗

Recent Activity

Tonic 
posted an update 18 days ago
view post
Post
2779
🙋🏻‍♂️ Hey there folks ,

Turns out : if we predict 🌏 earth we can save a lot of time looking for interesting things and less time looking at things that we expect to see.

Sentinel-2 imagery 🛰️basically takes a long time to download towards earth. so our "near real time" systems are quite far from that in practical terms.

meanwhile , if we "predict" what we will see , based on what we do see , we can send down much less data in a timely way , and prioritize 📡earth-bound response .

I'm talking about illegal fishing , logging , mining or building in nature reserves , the more of that we predict early the more we're able to stop it on time.

At least that's the concept !

check out the blog : https://huggingface.co/blog/Tonic/save-patagonia-by-predicting-earth


- Collection: https://huggingface.co/collections/NuTonic/earth-observation-with-temporal-and-general-understanding
- Code: https://github.com/Josephrp/Nutonic
- Dataset: NuTonic/sat-vl-sft-training-ready-v1
- Model: NuTonic/lspace
- Training: NuTonic/lspace-trackio
- Evals: NuTonic/Patagonia_Eval
  • 2 replies
·
Tonic 
posted an update about 1 month ago
view post
Post
4281
🙋🏻‍♂️ Hey there folks,

since everyone liked my previous announcement post ( https://huggingface.co/posts/Tonic/338509028435394 ) so much , i'm back with more high quality proceedural datasets in the Geospacial domain for SFT training !

Check this one out :
NuTonic/sat-bbox-metadata-sft-v1

the goal is to be able to train vision models on multiple images for remote sensing analysis with one shot .

hope you like it ! 🚀
  • 2 replies
·
Tonic 
posted an update about 1 month ago
view post
Post
3631
🙋🏻‍♂️ Hey there folks ,

I'm sharing huggingface's largest dataset of annotated statelite images today.

check it out here : NuTonic/sat-image-boundingbox-sft-full

I hope you like it , the idea is to be able to use this with small vision models 🚀
umarbutler 
posted an update 2 months ago
view post
Post
4914
Isaacus, the AI research company building legal superintelligence, is hiring!

We're looking for passionate engineers who love to build and tinker and want to have an impact on the world. Specifically, we're hiring:
• ML engineers (Australia).
• Data engineers (Australia).
• Full-stack engineers (Australia).
• DevRel engineers (Australia, San Francisco, and London).
• DevOps engineers (Australia, San Francisco, and London).

If you'd like to be a founding employee at one of the few VC-backed LLM research labs in the world, receive generous equity compensation, and work alongside other highly motivated, highly skilled engineers, get in touch: https://isaacus.com/careers
Nymbo 
posted an update 3 months ago
view post
Post
7244
We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.
  • 3 replies
·
umarbutler 
posted an update 3 months ago
view post
Post
1990
This awesome visualization by @abdurrahmanbutler tracks how reliant the High Court of Australia has been on UK precedents over time.

Back in the early 1900s, up to 70% of citations in High Court decisions were from the UK. Today, that number sits around 20%.

This change seems to have happened gradually as Australia gained more and more independence from the UK, culminating in the Australia Acts of 1986, where we see a nice bump in the proportion of Australian cases cited.

These insights would not be possible without our latest legal AI model, Kanon 2 Enricher, which we used to extract dates and citations from High Court decisions in isaacus/open-australian-legal-corpus and categorize citations by jurisdiction. You can learn about Kanon 2 Enricher here: https://isaacus.com/blog/kanon-2-enricher.
Tonic 
posted an update 3 months ago
view post
Post
3761
🤔 Who would win ?

- a fully subsidized ai lab
OR
- 3 random students named
kurakurai
?

demo : Tonic/fr-on-device

if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .
  • 4 replies
·
umarbutler 
posted an update 3 months ago
view post
Post
2239
@abdurrahmanbutler and I just dropped Legal RAG Bench, the first benchmark for legal RAG systems to simultaneously evaluate hallucinations, retrieval failures, and reasoning errors.

Our key takeaways are:
1. Embedding models, not generative models, are the primary driver of RAG accuracy. Switching from a general-purpose embedder like OpenAI's Text Embedding 3 Large to a legal domain embedder like Isaacus' Kanon 2 Embedder can raise accuracy by ~19 points.
2. Hallucinations are often triggered by retrieval failures. Fix your retrieval stack, and, in most cases, you end up fixing hallucinations.
3. Once you have a solid legal retrieval engine like Kanon 2 Embedder, it doesn’t matter as much what generative model you use; GPT-5.2 and Gemini 3.1 Pro perform relatively similarly, with Gemini 3.1 Pro achieving slightly better accuracy at the cost of more hallucinations.
4. Google's latest LLM, Gemini 3.1 Pro, is actually a bit worse than its predecessor at legal RAG, achieving 79.3% accuracy instead of 80.3%.

These findings confirm what we already knew at Isaacus: that information retrieval sets the ceiling on the accuracy of legal RAG systems. It doesn’t matter how smart you are; you aren’t going to magically know what the penalty is for speeding in California without access to an up-to-date copy of the California Vehicle Code.

Even still, to our knowledge, we’re the first to actually show this empirically.

Unfortunately, as we highlight in our write-up, high-quality open legal benchmarks like Legal RAG Bench and our earlier MLEB are few and far between.

In the interests of transparency, we have not only detailed exactly how we built Legal RAG Bench, but we’ve also released all of our data openly on Hugging Face. You can read our write up [here](https://isaacus.com/blog/legal-rag-bench), noting that we’ll soon be publishing it as a paper.

Kudos to my brother @abdurrahmanbutler for serving as the lead author on this monumental release.
  • 2 replies
·
Tonic 
posted an update 3 months ago
view post
Post
3451
🙋🏻‍♂️hello my lovelies ,

it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.

repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw

you can also run it locally and see for yourself :

docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest


just a few quite minor details i'll take care of but i wanted to share here first
  • 2 replies
·
AdinaY 
posted an update 4 months ago
view post
Post
3943
MiniMax M2.5 is now available on the hub 🚀

MiniMaxAI/MiniMax-M2.5

✨ 229B - Modified MIT license
✨37% faster than M2.1
✨ ~$1/hour at 100 TPS
  • 2 replies
·
AdinaY 
posted an update 4 months ago