Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Edit Datasets filters
Main
Tasks
Libraries
Languages
Licenses
Other
1
Reset Other
language-modeling
art
Synthetic
medical
code
biology
finance
legal
chemistry
music
agent
climate
Apply filters
Datasets
95
Full-text search
Edit filters
Sort: Trending
Active filters:
language-modeling
Clear all
allenai/dolma
Updated
Apr 17, 2024
•
7.34k
•
997
Skywork/SkyPile-150B
Viewer
•
Updated
Dec 7, 2023
•
1.76M
•
8.72k
•
402
ai4bharat/sangraha
Viewer
•
Updated
Mar 5, 2025
•
268M
•
20.1k
•
67
orionweller/mmBERT-pretrain-p1-fineweb2-langs
Updated
Oct 13, 2025
•
76
•
7
omdeep22/Konkani_books_corpus-v2
Viewer
•
Updated
Jan 28
•
8.23M
•
62
•
1
aisingapore/SEA-Instruct-2602
Viewer
•
Updated
Feb 6
•
7.2M
•
321
•
1
nthngdy/bert_dataset_202203
Viewer
•
Updated
Jan 17, 2023
•
147M
•
259
EleutherAI/pythia-memorized-evals
Viewer
•
Updated
17 days ago
•
31.4M
•
138
•
3
RyokoExtra/SuperWIKI
Preview
•
Updated
Nov 20, 2023
•
11
•
1
RyokoExtra/SuperWIKI-Cleaned
Viewer
•
Updated
Sep 8, 2023
•
221k
•
23
•
4
yjkim27/The-Philosophy-Data-Project
Viewer
•
Updated
Dec 18, 2023
•
396k
•
43
•
8
nicholasKluge/Pt-Corpus-Instruct-tokenized
Viewer
•
Updated
May 31, 2025
•
3.06M
•
137
nicholasKluge/Pt-Corpus-Instruct
Viewer
•
Updated
Jun 18, 2024
•
10.6M
•
300
•
3
nicholasKluge/Pt-Corpus
Viewer
•
Updated
Jun 18, 2024
•
5.77M
•
701
•
3
nicholasKluge/Pt-Corpus-tokenized
Viewer
•
Updated
Jun 18, 2024
•
2.02M
•
234
portkey/truthful_qa_context
Viewer
•
Updated
Jan 28, 2024
•
817
•
13
•
8
cfilt/IITB-IndicMonoDoc
Updated
Feb 18, 2025
•
18.3k
•
10
EikESousA/anvisa_instruct_tokenized
Viewer
•
Updated
Apr 18, 2024
•
3.06M
•
64
emozilla/dolma-v1_7-305B
Viewer
•
Updated
May 13, 2024
•
343M
•
830
•
10
emozilla/dolma-v1_7-30B
Viewer
•
Updated
May 23, 2024
•
34.5M
•
190
emozilla/dolma-v1_7-30B-tokenized-llama3-nanoset
Updated
May 20, 2024
•
52
emozilla/dolma-v1_7-305B-tokenized-llama3-nanoset
Updated
May 29, 2024
•
167
emozilla/dolma-v1_7-3B-tokenized-llama3-nanoset
Updated
May 23, 2024
•
19
emozilla/dolma-v1_7-3B
Viewer
•
Updated
May 23, 2024
•
3.4M
•
165
emozilla/dolma-v1_7-30B-tokenized-llama2-nanoset
Updated
Jul 9, 2024
•
74
meg/dolma-v1_6-sample
Updated
May 31, 2024
•
36
TucanoBR/GigaVerbo-Text-Filter
Viewer
•
Updated
Jul 24, 2025
•
110k
•
137
•
8
TucanoBR/GigaVerbo
Viewer
•
Updated
Jul 24, 2025
•
145M
•
1.26k
•
30
faur-ai/fulg
Preview
•
Updated
Aug 15, 2024
•
14.1k
•
10
TucanoBR/Tucano-SFT
Viewer
•
Updated
Nov 13, 2024
•
680k
•
90
•
2
Previous
1
2
3
4
Next