Dataset Preview
The full dataset viewer is not available (click to read why). Only showing a preview of the rows.
The dataset generation failed because of a cast error
Error code: DatasetGenerationCastError
Exception: DatasetGenerationCastError
Message: An error occurred while generating the dataset
All the data files must have the same columns, but at some point there are 1 new columns ({'07_48_88_1_1_1_1_1_1_1_1_1'}) and 1 missing columns ({'CELL_1'}).
This happened while the csv dataset builder was generating data using
hf://datasets/altoslabs/perturbench/jiang24_split.csv (at revision c3150b8c6ca086beb1110e9b7da0a77814c01f5b)
Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)
Traceback: Traceback (most recent call last):
File "/src/services/worker/.venv/lib/python3.12/site-packages/datasets/builder.py", line 1831, in _prepare_split_single
writer.write_table(table)
File "/src/services/worker/.venv/lib/python3.12/site-packages/datasets/arrow_writer.py", line 714, in write_table
pa_table = table_cast(pa_table, self._schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/services/worker/.venv/lib/python3.12/site-packages/datasets/table.py", line 2272, in table_cast
return cast_table_to_schema(table, schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/services/worker/.venv/lib/python3.12/site-packages/datasets/table.py", line 2218, in cast_table_to_schema
raise CastError(
datasets.table.CastError: Couldn't cast
07_48_88_1_1_1_1_1_1_1_1_1: string
train: string
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 525
to
{'CELL_1': Value('string'), 'train': Value('string')}
because column names don't match
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 1450, in compute_config_parquet_and_info_response
parquet_operations, partial, estimated_dataset_info = stream_convert_to_parquet(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 993, in stream_convert_to_parquet
builder._prepare_split(
File "/src/services/worker/.venv/lib/python3.12/site-packages/datasets/builder.py", line 1702, in _prepare_split
for job_id, done, content in self._prepare_split_single(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/services/worker/.venv/lib/python3.12/site-packages/datasets/builder.py", line 1833, in _prepare_split_single
raise DatasetGenerationCastError.from_cast_error(
datasets.exceptions.DatasetGenerationCastError: An error occurred while generating the dataset
All the data files must have the same columns, but at some point there are 1 new columns ({'07_48_88_1_1_1_1_1_1_1_1_1'}) and 1 missing columns ({'CELL_1'}).
This happened while the csv dataset builder was generating data using
hf://datasets/altoslabs/perturbench/jiang24_split.csv (at revision c3150b8c6ca086beb1110e9b7da0a77814c01f5b)
Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.
CELL_1 string | train string |
|---|---|
CELL_2 | train |
CELL_3 | train |
CELL_4 | train |
CELL_5 | train |
CELL_6 | train |
CELL_7 | train |
CELL_8 | train |
CELL_9 | train |
CELL_10 | train |
CELL_11 | train |
CELL_12 | train |
CELL_13 | train |
CELL_14 | train |
CELL_15 | train |
CELL_16 | train |
CELL_17 | train |
CELL_18 | train |
CELL_19 | train |
CELL_20 | train |
CELL_21 | train |
CELL_22 | train |
CELL_23 | train |
CELL_24 | train |
CELL_25 | train |
CELL_26 | train |
CELL_27 | train |
CELL_28 | train |
CELL_29 | train |
CELL_30 | train |
CELL_31 | train |
CELL_32 | train |
CELL_33 | train |
CELL_34 | train |
CELL_35 | train |
CELL_36 | train |
CELL_37 | train |
CELL_38 | train |
CELL_39 | train |
CELL_40 | train |
CELL_41 | train |
CELL_42 | train |
CELL_43 | train |
CELL_44 | train |
CELL_45 | train |
CELL_46 | train |
CELL_47 | train |
CELL_48 | train |
CELL_49 | train |
CELL_50 | train |
CELL_51 | train |
CELL_52 | train |
CELL_53 | train |
CELL_54 | train |
CELL_55 | train |
CELL_56 | train |
CELL_57 | train |
CELL_58 | train |
CELL_59 | train |
CELL_60 | train |
CELL_61 | train |
CELL_62 | train |
CELL_63 | train |
CELL_64 | train |
CELL_65 | train |
CELL_66 | train |
CELL_67 | train |
CELL_68 | train |
CELL_69 | train |
CELL_70 | train |
CELL_71 | train |
CELL_72 | train |
CELL_73 | train |
CELL_74 | train |
CELL_75 | train |
CELL_76 | train |
CELL_77 | train |
CELL_78 | train |
CELL_79 | train |
CELL_80 | train |
CELL_81 | train |
CELL_82 | train |
CELL_83 | train |
CELL_84 | train |
CELL_85 | train |
CELL_86 | train |
CELL_87 | train |
CELL_88 | train |
CELL_89 | train |
CELL_90 | train |
CELL_91 | train |
CELL_92 | train |
CELL_93 | train |
CELL_94 | train |
CELL_95 | train |
CELL_96 | train |
CELL_97 | train |
CELL_98 | train |
CELL_99 | train |
CELL_100 | train |
CELL_101 | train |
End of preview.
The dataset contains data used in work: "Perturbench: Benchmarking machine learning models for cellular perturbation analysis."
The data comes from the following publications:
- Norman, T. M., Horlbeck, M. A., Replogle, J. M., Ge, A. Y., Xu, A., Jost, M., Gilbert, L. A., and Weissman, J. S. (2019). Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science, 365(6455):786–793.
- Srivatsan, S. R., McFaline-Figueroa, J. L., Ramani, V., Saunders, L., Cao, J., Packer, J., Pliner, H. A., Jackson, D. L., Daza, R. M., Christiansen, L., Zhang, F., Steemers, F., Shendure, J., and Trapnell, C. (2020). Massively multiplex chemical transcriptomics at single-cell resolution. Science, 367(6473):45–51.
- Frangieh, C. J., Melms, J. C., Thakore, P. I., Geiger-Schuller, K. R., Ho, P., Luoma, A. M., Cleary, B., Jerby-Arnon, L., Malu, S., Cuoco, M. S., Zhao, M., Ager, C. R., Rogava, M., Hovey, L., Rotem, A., Bernatchez, C., Wucherpfennig, K. W., Johnson, B. E., Rozenblatt-Rosen, O., Schadendorf, D., Regev, A., and Izar, B. (2021). Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet., 53(3):332–341.
- Jiang, L., Dalgarno, C., Papalexi, E., Mascio, I., Wessels, H.-H., Yun, H., Iremadze, N., Lithwick Yanai, G., Lipson, D., and Satija, R. (2024a). Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens. bioRxiv, page 2024.01.29.576933.
- McFaline-Figueroa, J. L., Srivatsan, S., Hill, A. J., Gasperini, M., Jackson, D. L., Saunders, L., Domcke, S., Regalado, S. G., Lazarchuck, P., Alvarez, S., et al. (2024). Multiplex single cell chemical genomics reveals the kinase dependence of the response to targeted therapy. Cell Genomics, 4(2)
- Szałata, A., Benz, A., Cannoodt, R., Cortes, M., Fong, J., Kuppasani, S., Lieberman, R., Liu, T., Mas-Rosario, J. A., Meinl, R., Nourisa, J., Tumiel, J., Tunjic, T. M., Wang, M., Weber, N., Zhao, H., Anchang, B., Theis, F. J., Luecken, M. D., Burkhardt, D. B. (2024). A benchmark for prediction of transcriptomic responses to chemical perturbations across cell types. NeurIPS, (38).
Datasets with fixed splits have their splits included as .csv files with two colums: the first corresponds to the cell ID (which is the .obs_names of the respective h5ad file) and second to the split value (train, val, test). Some datasets contain multiple splits in which case the split files are in a tar.gz.
- Downloads last month
- 107