Datasets:
metadata
dict | network-type
string | nodes
list | edges
list | incidences
list |
|---|---|---|---|---|
{
"dataset_name": "daqh/email-Enron"
}
|
undirected
| [{"attrs":{"eigsh":[-0.1489892687337433,0.009346769030368263,-0.0718973262195004,0.11064189865846587(...TRUNCATED)
| [{"attrs":{"eigsh":[0.02096569673443807,0.02203075838645441,-0.041671354055994314,0.0009419795848772(...TRUNCATED)
| [{"edge":0,"node":40},{"edge":0,"node":41},{"edge":1,"node":83},{"edge":1,"node":5},{"edge":2,"node"(...TRUNCATED)
|
email-Enron
Zenodo | Cornell | Source Paper
email-Enron is an undirected hypergraph built from the Enron email corpus, designed for higher-order network / hypergraph machine learning. In email communication, a single message can involve more than two people; this dataset captures that group interaction by modeling each email as a hyperedge connecting the sender and all recipients, while nodes represent Enron email addresses (restricted to a core set of employees).
Usage
dataset = load_dataset("daqh/email-Enron", split="full")
hypergraphs = [xgi.from_hif_dict(d, nodetype=int, edgetype=int) for d in dataset]
Statistics
|
|
|
|
Content
The hypergraph is stored in HIF (Hypergraph Interchange Format) as a JSON object, following the schema used to exchange higher-order network data across tools. Concretely, the dataset provides the canonical HIF fields-network-type, metadata, nodes, edges, and incidences-so you can reconstruct the full incidence structure without additional processing.
In addition to the raw hypergraph topology, vector features are provided for both nodes and hyperedges (in their attribute dictionaries), enabling out-of-the-box experimentation with representation learning and downstream tasks:
- Spectral features: eigenvectors of the (hypergraph) Laplacian (computed via sparse eigensolvers).
- Node2Vec embeddings: random-walk–based structural embeddings.
- VilLain embeddings: self-supervised hypergraph representation learning via virtual label propagation.
Basic statistics (as packaged here): 143 nodes, 1512 hyperedges, 1 connected component.
Citation
@article{Benson-2018-simplicial,
author = {Benson, Austin R. and Abebe, Rediet and Schaub, Michael T. and Jadbabaie, Ali and Kleinberg, Jon},
title = {Simplicial closure and higher-order link prediction},
year = {2018},
doi = {10.1073/pnas.1800683115},
publisher = {National Academy of Sciences},
issn = {0027-8424},
journal = {Proceedings of the National Academy of Sciences}
}
- Downloads last month
- 720