Datasets:

daqh
/

email-Enron

metadata dict	network-type string	nodes list	edges list	incidences list
{ "dataset_name": "daqh/email-Enron" }	undirected	[{"attrs":{"eigsh":[-0.1489892687337433,0.009346769030368263,-0.0718973262195004,0.11064189865846587(...TRUNCATED)	[{"attrs":{"eigsh":[0.02096569673443807,0.02203075838645441,-0.041671354055994314,0.0009419795848772(...TRUNCATED)	[{"edge":0,"node":40},{"edge":0,"node":41},{"edge":1,"node":83},{"edge":1,"node":5},{"edge":2,"node"(...TRUNCATED)

email-Enron

Zenodo | Cornell | Source Paper

email-Enron is an undirected hypergraph built from the Enron email corpus, designed for higher-order network / hypergraph machine learning. In email communication, a single message can involve more than two people; this dataset captures that group interaction by modeling each email as a hyperedge connecting the sender and all recipients, while nodes represent Enron email addresses (restricted to a core set of employees).

Usage

dataset = load_dataset("daqh/email-Enron", split="full")
hypergraphs = [xgi.from_hif_dict(d, nodetype=int, edgetype=int) for d in dataset]

Statistics

Content

The hypergraph is stored in HIF (Hypergraph Interchange Format) as a JSON object, following the schema used to exchange higher-order network data across tools. Concretely, the dataset provides the canonical HIF fields-network-type, metadata, nodes, edges, and incidences-so you can reconstruct the full incidence structure without additional processing.

In addition to the raw hypergraph topology, vector features are provided for both nodes and hyperedges (in their attribute dictionaries), enabling out-of-the-box experimentation with representation learning and downstream tasks:

Spectral features: eigenvectors of the (hypergraph) Laplacian (computed via sparse eigensolvers).
Node2Vec embeddings: random-walk–based structural embeddings.
VilLain embeddings: self-supervised hypergraph representation learning via virtual label propagation.

Basic statistics (as packaged here): 143 nodes, 1512 hyperedges, 1 connected component.

Citation

@article{Benson-2018-simplicial,
 author = {Benson, Austin R. and Abebe, Rediet and Schaub, Michael T. and Jadbabaie, Ali and Kleinberg, Jon},
 title = {Simplicial closure and higher-order link prediction},
 year = {2018},
 doi = {10.1073/pnas.1800683115},
 publisher = {National Academy of Sciences},
 issn = {0027-8424},
 journal = {Proceedings of the National Academy of Sciences}
}

Downloads last month: 720

Size of downloaded dataset files:

13.6 MB

Size of the auto-converted Parquet files:

5.21 MB

Number of rows:

Collection including daqh/email-Enron

higher-order-networks

Collection

A curated set of hypergraphs. Datasets are stored as HIFs and include precomputed node and hyperedge features. • 5 items • Updated 5 days ago

Papers for daqh/email-Enron

Simplicial Closure and higher-order link prediction

Paper • 1802.06916 • Published Feb 20, 2018 • 3

node2vec: Scalable Feature Learning for Networks

Paper • 1607.00653 • Published Jul 3, 2016