Dataset Viewer
Auto-converted to Parquet Duplicate
metadata
dict
network-type
string
nodes
list
edges
list
incidences
list
{ "dataset_name": "daqh/email-Enron" }
undirected
[{"attrs":{"eigsh":[-0.1489892687337433,0.009346769030368263,-0.0718973262195004,0.11064189865846587(...TRUNCATED)
[{"attrs":{"eigsh":[0.02096569673443807,0.02203075838645441,-0.041671354055994314,0.0009419795848772(...TRUNCATED)
[{"edge":0,"node":40},{"edge":0,"node":41},{"edge":1,"node":83},{"edge":1,"node":5},{"edge":2,"node"(...TRUNCATED)

email-Enron

Zenodo | Cornell | Source Paper

email-Enron is an undirected hypergraph built from the Enron email corpus, designed for higher-order network / hypergraph machine learning. In email communication, a single message can involve more than two people; this dataset captures that group interaction by modeling each email as a hyperedge connecting the sender and all recipients, while nodes represent Enron email addresses (restricted to a core set of employees).

Usage

dataset = load_dataset("daqh/email-Enron", split="full")
hypergraphs = [xgi.from_hif_dict(d, nodetype=int, edgetype=int) for d in dataset]

Statistics

Content

The hypergraph is stored in HIF (Hypergraph Interchange Format) as a JSON object, following the schema used to exchange higher-order network data across tools. Concretely, the dataset provides the canonical HIF fields-network-type, metadata, nodes, edges, and incidences-so you can reconstruct the full incidence structure without additional processing.

In addition to the raw hypergraph topology, vector features are provided for both nodes and hyperedges (in their attribute dictionaries), enabling out-of-the-box experimentation with representation learning and downstream tasks:

  • Spectral features: eigenvectors of the (hypergraph) Laplacian (computed via sparse eigensolvers).
  • Node2Vec embeddings: random-walk–based structural embeddings.
  • VilLain embeddings: self-supervised hypergraph representation learning via virtual label propagation.

Basic statistics (as packaged here): 143 nodes, 1512 hyperedges, 1 connected component.

Citation

@article{Benson-2018-simplicial,
 author = {Benson, Austin R. and Abebe, Rediet and Schaub, Michael T. and Jadbabaie, Ali and Kleinberg, Jon},
 title = {Simplicial closure and higher-order link prediction},
 year = {2018},
 doi = {10.1073/pnas.1800683115},
 publisher = {National Academy of Sciences},
 issn = {0027-8424},
 journal = {Proceedings of the National Academy of Sciences}
}
Downloads last month
720

Collection including daqh/email-Enron

Papers for daqh/email-Enron