Dmoz-tddli.rar
The data includes deep taxonomic paths (e.g., Science/Technology/Space ), which is excellent for testing multi-level classification algorithms. Weaknesses:
Unlike machine-generated lists, DMOZ data was curated by over 90,000 volunteer editors, making the classifications highly accurate for its time. DMOZ-TDDLI.rar
About Dataset. This is an url classification dataset from dmoz directory. There are 15 class for classification. The data includes deep taxonomic paths (e
Highly recommended for researchers looking to train text-classification models or explore the historical structure of the early-to-mid-2000s internet. Community Perspectives 000 volunteer editors
“DMOZ — the Open Directory Project — officially closed today. It marks the end of an era of humans trying to catalog the entire web.” Search Engine Land · 9 years ago