Datasets:

altoslabs
/

perturbench

License:

Dataset Preview

Split (1)

train

The full dataset viewer is not available (click to read why). Only showing a preview of the rows.

The dataset generation failed because of a cast error

Error code:   DatasetGenerationCastError
Exception:    DatasetGenerationCastError
Message:      An error occurred while generating the dataset

All the data files must have the same columns, but at some point there are 1 new columns ({'07_48_88_1_1_1_1_1_1_1_1_1'}) and 1 missing columns ({'CELL_1'}).

This happened while the csv dataset builder was generating data using

hf://datasets/altoslabs/perturbench/jiang24_split.csv (at revision c3150b8c6ca086beb1110e9b7da0a77814c01f5b)

Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.12/site-packages/datasets/builder.py", line 1831, in _prepare_split_single
                  writer.write_table(table)
                File "/src/services/worker/.venv/lib/python3.12/site-packages/datasets/arrow_writer.py", line 714, in write_table
                  pa_table = table_cast(pa_table, self._schema)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/src/services/worker/.venv/lib/python3.12/site-packages/datasets/table.py", line 2272, in table_cast
                  return cast_table_to_schema(table, schema)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/src/services/worker/.venv/lib/python3.12/site-packages/datasets/table.py", line 2218, in cast_table_to_schema
                  raise CastError(
              datasets.table.CastError: Couldn't cast
              07_48_88_1_1_1_1_1_1_1_1_1: string
              train: string
              -- schema metadata --
              pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 525
              to
              {'CELL_1': Value('string'), 'train': Value('string')}
              because column names don't match
              
              During handling of the above exception, another exception occurred:
              
              Traceback (most recent call last):
                File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 1450, in compute_config_parquet_and_info_response
                  parquet_operations, partial, estimated_dataset_info = stream_convert_to_parquet(
                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 993, in stream_convert_to_parquet
                  builder._prepare_split(
                File "/src/services/worker/.venv/lib/python3.12/site-packages/datasets/builder.py", line 1702, in _prepare_split
                  for job_id, done, content in self._prepare_split_single(
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
                File "/src/services/worker/.venv/lib/python3.12/site-packages/datasets/builder.py", line 1833, in _prepare_split_single
                  raise DatasetGenerationCastError.from_cast_error(
              datasets.exceptions.DatasetGenerationCastError: An error occurred while generating the dataset
              
              All the data files must have the same columns, but at some point there are 1 new columns ({'07_48_88_1_1_1_1_1_1_1_1_1'}) and 1 missing columns ({'CELL_1'}).
              
              This happened while the csv dataset builder was generating data using
              
              hf://datasets/altoslabs/perturbench/jiang24_split.csv (at revision c3150b8c6ca086beb1110e9b7da0a77814c01f5b)
              
              Please either edit the data files to have matching columns, or separate them into different configurations (see docs at https://hf.co/docs/hub/datasets-manual-configuration#multiple-configurations)

Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.

CELL_1 string	train string
CELL_2	train
CELL_3	train
CELL_4	train
CELL_5	train
CELL_6	train
CELL_7	train
CELL_8	train
CELL_9	train
CELL_10	train
CELL_11	train
CELL_12	train
CELL_13	train
CELL_14	train
CELL_15	train
CELL_16	train
CELL_17	train
CELL_18	train
CELL_19	train
CELL_20	train
CELL_21	train
CELL_22	train
CELL_23	train
CELL_24	train
CELL_25	train
CELL_26	train
CELL_27	train
CELL_28	train
CELL_29	train
CELL_30	train
CELL_31	train
CELL_32	train
CELL_33	train
CELL_34	train
CELL_35	train
CELL_36	train
CELL_37	train
CELL_38	train
CELL_39	train
CELL_40	train
CELL_41	train
CELL_42	train
CELL_43	train
CELL_44	train
CELL_45	train
CELL_46	train
CELL_47	train
CELL_48	train
CELL_49	train
CELL_50	train
CELL_51	train
CELL_52	train
CELL_53	train
CELL_54	train
CELL_55	train
CELL_56	train
CELL_57	train
CELL_58	train
CELL_59	train
CELL_60	train
CELL_61	train
CELL_62	train
CELL_63	train
CELL_64	train
CELL_65	train
CELL_66	train
CELL_67	train
CELL_68	train
CELL_69	train
CELL_70	train
CELL_71	train
CELL_72	train
CELL_73	train
CELL_74	train
CELL_75	train
CELL_76	train
CELL_77	train
CELL_78	train
CELL_79	train
CELL_80	train
CELL_81	train
CELL_82	train
CELL_83	train
CELL_84	train
CELL_85	train
CELL_86	train
CELL_87	train
CELL_88	train
CELL_89	train
CELL_90	train
CELL_91	train
CELL_92	train
CELL_93	train
CELL_94	train
CELL_95	train
CELL_96	train
CELL_97	train
CELL_98	train
CELL_99	train
CELL_100	train
CELL_101	train

End of preview.

The dataset contains data used in work: "Perturbench: Benchmarking machine learning models for cellular perturbation analysis."

The data comes from the following publications:

Norman, T. M., Horlbeck, M. A., Replogle, J. M., Ge, A. Y., Xu, A., Jost, M., Gilbert, L. A., and Weissman, J. S. (2019). Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science, 365(6455):786–793.
Srivatsan, S. R., McFaline-Figueroa, J. L., Ramani, V., Saunders, L., Cao, J., Packer, J., Pliner, H. A., Jackson, D. L., Daza, R. M., Christiansen, L., Zhang, F., Steemers, F., Shendure, J., and Trapnell, C. (2020). Massively multiplex chemical transcriptomics at single-cell resolution. Science, 367(6473):45–51.
Frangieh, C. J., Melms, J. C., Thakore, P. I., Geiger-Schuller, K. R., Ho, P., Luoma, A. M., Cleary, B., Jerby-Arnon, L., Malu, S., Cuoco, M. S., Zhao, M., Ager, C. R., Rogava, M., Hovey, L., Rotem, A., Bernatchez, C., Wucherpfennig, K. W., Johnson, B. E., Rozenblatt-Rosen, O., Schadendorf, D., Regev, A., and Izar, B. (2021). Multimodal pooled Perturb-CITE-seq screens in patient models define mechanisms of cancer immune evasion. Nat. Genet., 53(3):332–341.
Jiang, L., Dalgarno, C., Papalexi, E., Mascio, I., Wessels, H.-H., Yun, H., Iremadze, N., Lithwick Yanai, G., Lipson, D., and Satija, R. (2024a). Systematic reconstruction of molecular pathway signatures using scalable single-cell perturbation screens. bioRxiv, page 2024.01.29.576933.
McFaline-Figueroa, J. L., Srivatsan, S., Hill, A. J., Gasperini, M., Jackson, D. L., Saunders, L., Domcke, S., Regalado, S. G., Lazarchuck, P., Alvarez, S., et al. (2024). Multiplex single cell chemical genomics reveals the kinase dependence of the response to targeted therapy. Cell Genomics, 4(2)
Szałata, A., Benz, A., Cannoodt, R., Cortes, M., Fong, J., Kuppasani, S., Lieberman, R., Liu, T., Mas-Rosario, J. A., Meinl, R., Nourisa, J., Tumiel, J., Tunjic, T. M., Wang, M., Weber, N., Zhao, H., Anchang, B., Theis, F. J., Luecken, M. D., Burkhardt, D. B. (2024). A benchmark for prediction of transcriptomic responses to chemical perturbations across cell types. NeurIPS, (38).

Datasets with fixed splits have their splits included as .csv files with two colums: the first corresponds to the cell ID (which is the .obs_names of the respective h5ad file) and second to the split value (train, val, test). Some datasets contain multiple splits in which case the split files are in a tar.gz.

Downloads last month: 107