SPSS案例数据 链接:https://pan.baidu.com/s/1CP8PrXCqXi985Sh-37VoeQ?pwd=xybl 提取码:xybl
机器学习数据集 UCI Machine Learning Repository: Data Sets
Name | Default Task | # Instances | # Attributes | Year |
Abalone | Classification | 4177 | 8 | 1995 |
Adult | Classification | 48842 | 14 | 1996 |
Annealing | Classification | 798 | 38 | |
Anonymous Microsoft Web Data | Recommender-Systems | 37711 | 294 | 1998 |
Arrhythmia | Classification | 452 | 279 | 1998 |
Artificial Characters | Classification | 6000 | 7 | 1992 |
Audiology (Original) | Classification | 226 | 1987 | |
Audiology (Standardized) | Classification | 226 | 69 | 1992 |
Auto MPG | Regression | 398 | 8 | 1993 |
Automobile | Regression | 205 | 26 | 1987 |
Badges | Classification | 294 | 1 | 1994 |
Balance Scale | Classification | 625 | 4 | 1994 |
Balloons | Classification | 16 | 4 | |
Breast Cancer | Classification | 286 | 9 | 1988 |
Breast Cancer Wisconsin (Original) | Classification | 699 | 10 | 1992 |
Breast Cancer Wisconsin (Prognostic) | Classification, Regression | 198 | 34 | 1995 |
Breast Cancer Wisconsin (Diagnostic) | Classification | 569 | 32 | 1995 |
Pittsburgh Bridges | Classification | 108 | 13 | 1990 |
Car Evaluation | Classification | 1728 | 6 | 1997 |
Census Income | Classification | 48842 | 14 | 1996 |
Chess (King-Rook vs. King-Knight) | Classification | 22 | 1988 | |
Chess (King-Rook vs. King-Pawn) | Classification | 3196 | 36 | 1989 |
Chess (King-Rook vs. King) | Classification | 28056 | 6 | 1994 |
Chess (Domain Theories) | ||||
Bach Chorales | 100 | 6 | ||
Connect-4 | Classification | 67557 | 42 | 1995 |
Credit Approval | Classification | 690 | 15 | |
Japanese Credit Screening | Classification | 125 | 1992 | |
Computer Hardware | Regression | 209 | 9 | 1987 |
Contraceptive Method Choice | Classification | 1473 | 9 | 1997 |
Covertype | Classification | 581012 | 54 | 1998 |
Cylinder Bands | Classification | 512 | 39 | 1995 |
Dermatology | Classification | 366 | 33 | 1998 |
Diabetes | 20 | |||
DGP2 - The Second Data Generation Program | ||||
Document Understanding | 1994 | |||
EBL Domain Theories | ||||
Echocardiogram | Classification | 132 | 12 | 1989 |
Ecoli | Classification | 336 | 8 | 1996 |
Flags | Classification | 194 | 30 | 1990 |
Function Finding | Function-Learning | 352 | 1990 | |
Glass Identification | Classification | 214 | 10 | 1987 |
Haberman's Survival | Classification | 306 | 3 | 1999 |
Hayes-Roth | Classification | 160 | 5 | 1989 |
Heart Disease | Classification | 303 | 75 | 1988 |
Hepatitis | Classification | 155 | 19 | 1988 |
Horse Colic | Classification | 368 | 27 | 1989 |
ICU | ||||
Image Segmentation | Classification | 2310 | 19 | 1990 |
Internet Advertisements | Classification | 3279 | 1558 | 1998 |
Ionosphere | Classification | 351 | 34 | 1989 |
Iris | Classification | 150 | 4 | 1988 |
ISOLET | Classification | 7797 | 617 | 1994 |
Kinship | Relational-Learning | 104 | 12 | 1990 |
Labor Relations | 57 | 16 | 1988 | |
LED Display Domain | Classification | 7 | 1988 | |
Lenses | Classification | 24 | 4 | 1990 |
Letter Recognition | Classification | 20000 | 16 | 1991 |
Liver Disorders | 345 | 7 | 1990 | |
Logic Theorist | ||||
Lung Cancer | Classification | 32 | 56 | 1992 |
Lymphography | Classification | 148 | 18 | 1988 |
Mechanical Analysis | Classification | 209 | 8 | 1990 |
Meta-data | Classification | 528 | 22 | 1996 |
Mobile Robots | 1995 | |||
Molecular Biology (Promoter Gene Sequences) | Classification | 106 | 58 | 1990 |
Molecular Biology (Protein Secondary Structure) | Classification | 128 | ||
Molecular Biology (Splice-junction Gene Sequences) | Classification | 3190 | 61 | 1992 |
MONK's Problems | Classification | 432 | 7 | 1992 |
Moral Reasoner | 202 | 1994 | ||
Multiple Features | Classification | 2000 | 649 | |
Mushroom | Classification | 8124 | 22 | 1987 |
Musk (Version 1) | Classification | 476 | 168 | 1994 |
Musk (Version 2) | Classification | 6598 | 168 | 1994 |
Nursery | Classification | 12960 | 8 | 1997 |
Othello Domain Theory | 1991 | |||
Page Blocks Classification | Classification | 5473 | 10 | 1995 |
Optical Recognition of Handwritten Digits | Classification | 5620 | 64 | 1998 |
Pen-Based Recognition of Handwritten Digits | Classification | 10992 | 16 | 1998 |
Post-Operative Patient | Classification | 90 | 8 | 1993 |
Primary Tumor | Classification | 339 | 17 | 1988 |
Prodigy | ||||
Qualitative Structure Activity Relationships | ||||
Quadruped Mammals | Classification | 72 | 1992 | |
Servo | Regression | 167 | 4 | 1993 |
Shuttle Landing Control | Classification | 15 | 6 | 1988 |
Solar Flare | Regression | 1389 | 10 | 1989 |
Soybean (Large) | Classification | 307 | 35 | 1988 |
Soybean (Small) | Classification | 47 | 35 | 1987 |
Challenger USA Space Shuttle O-Ring | Regression | 23 | 4 | 1993 |
Low Resolution Spectrometer | Classification | 531 | 102 | 1988 |
Spambase | Classification | 4601 | 57 | 1999 |
SPECT Heart | Classification | 267 | 22 | 2001 |
SPECTF Heart | Classification | 267 | 44 | 2001 |
Sponge | Clustering | 76 | 45 | |
Statlog Project | 1992 | |||
Student Loan Relational | 1000 | 1993 | ||
Teaching Assistant Evaluation | Classification | 151 | 5 | 1997 |
Tic-Tac-Toe Endgame | Classification | 958 | 9 | 1991 |
Thyroid Disease | Classification | 7200 | 21 | 1987 |
Trains | Classification | 10 | 32 | 1994 |
University | Classification | 285 | 17 | 1988 |
Congressional Voting Records | Classification | 435 | 16 | 1987 |
Water Treatment Plant | Clustering | 527 | 38 | 1993 |
Waveform Database Generator (Version 1) | Classification | 5000 | 21 | 1988 |
Waveform Database Generator (Version 2) | Classification | 5000 | 40 | 1988 |
Wine | Classification | 178 | 13 | 1991 |
Yeast | Classification | 1484 | 8 | 1996 |
Zoo | Classification | 101 | 17 | 1990 |
Undocumented | ||||
Twenty Newsgroups | 20000 | 1999 | ||
Australian Sign Language signs | Classification | 6650 | 15 | 1999 |
Australian Sign Language signs (High Quality) | Classification | 2565 | 22 | 2002 |
US Census Data (1990) | Clustering | 2458285 | 68 | |
Census-Income (KDD) | Classification | 299285 | 40 | 2000 |
Coil 1999 Competition Data | 340 | 17 | 1999 | |
Corel Image Features | 68040 | 89 | 1999 | |
E. Coli Genes | 2001 | |||
EEG Database | 122 | 4 | 1999 | |
El Nino | 178080 | 12 | 1999 | |
Entree Chicago Recommendation Data | Recommender-Systems | 50672 | 2000 | |
CMU Face Images | Classification | 640 | 1999 | |
Insurance Company Benchmark (COIL 2000) | Regression, Description | 9000 | 86 | 2000 |
Internet Usage Data | 10104 | 72 | 1999 | |
IPUMS Census Database | 256932 | 61 | 1999 | |
Japanese Vowels | Classification | 640 | 12 | |
KDD Cup 1998 Data | Regression | 191779 | 481 | 1998 |
KDD Cup 1999 Data | Classification | 4000000 | 42 | 1999 |
M. Tuberculosis Genes | 2001 | |||
Movie | 10000 | 1999 | ||
MSNBC.com Anonymous Web Data | 989818 | |||
NSF Research Award Abstracts 1990-2003 | 129000 | 2003 | ||
Pioneer-1 Mobile Robot Data | 1999 | |||
Pseudo Periodic Synthetic Time Series | 100000 | 1999 | ||
Reuters-21578 Text Categorization Collection | Classification | 21578 | 5 | 1997 |
Robot Execution Failures | Classification | 463 | 90 | 1999 |
Synthetic Control Chart Time Series | Classification, Clustering | 600 | 1999 | |
Syskill and Webert Web Page Ratings | Classification | 332 | 5 | 1998 |
UNIX User Data | ||||
Volcanoes on Venus - JARtool experiment | Classification | |||
Statlog (Australian Credit Approval) | Classification | 690 | 14 | |
Statlog (German Credit Data) | Classification | 1000 | 20 | 1994 |
Statlog (Heart) | Classification | 270 | 13 | |
Statlog (Landsat Satellite) | Classification | 6435 | 36 | 1993 |
Statlog (Image Segmentation) | Classification | 2310 | 19 | 1990 |
Statlog (Shuttle) | Classification | 58000 | 9 | |
Statlog (Vehicle Silhouettes) | Classification | 946 | 18 | |
Connectionist Bench (Nettalk Corpus) | 20008 | 4 | ||
Connectionist Bench (Sonar, Mines vs. Rocks) | Classification | 208 | 60 | |
Connectionist Bench (Vowel Recognition - Deterding Data) | Classification | 528 | 10 | |
Economic Sanctions | ||||
Protein Data | ||||
Cloud | 1024 | 10 | 1989 | |
CalIt2 Building People Counts | 10080 | 4 | 2006 | |
Dodgers Loop Sensor | 50400 | 3 | 2006 | |
Poker Hand | Classification | 1025010 | 11 | 2007 |
MAGIC Gamma Telescope | Classification | 19020 | 11 | 2007 |
UJI Pen Characters | Classification | 1364 | 2007 | |
Mammographic Mass | Classification | 961 | 6 | 2007 |
Forest Fires | Regression | 517 | 13 | 2008 |
Reuters Transcribed Subset | Classification | 200 | 2008 | |
Bag of Words | Clustering | 8000000 | 100000 | 2008 |
Concrete Compressive Strength | Regression | 1030 | 9 | 2007 |
Hill-Valley | Classification | 606 | 101 | 2008 |
Arcene | Classification | 900 | 10000 | 2008 |
Dexter | Classification | 2600 | 20000 | 2008 |
Dorothea | Classification | 1950 | 100000 | 2008 |
Gisette | Classification | 13500 | 5000 | 2008 |
Madelon | Classification | 4400 | 500 | 2008 |
Ozone Level Detection | Classification | 2536 | 73 | 2008 |
Abscisic Acid Signaling Network | Causal-Discovery | 300 | 43 | 2008 |
Parkinsons | Classification | 197 | 23 | 2008 |
Character Trajectories | Classification, Clustering | 2858 | 3 | 2008 |
Blood Transfusion Service Center | Classification | 748 | 5 | 2008 |
UJI Pen Characters (Version 2) | Classification | 11640 | 2009 | |
Semeion Handwritten Digit | Classification | 1593 | 256 | 2008 |
SECOM | Classification, Causal-Discovery | 1567 | 591 | 2008 |
Plants | Clustering | 22632 | 70 | 2008 |
Libras Movement | Classification, Clustering | 360 | 91 | 2009 |
Concrete Slump Test | Regression | 103 | 10 | 2009 |
Communities and Crime | Regression | 1994 | 128 | 2009 |
Acute Inflammations | Classification | 120 | 6 | 2009 |
Wine Quality | Classification, Regression | 4898 | 12 | 2009 |
URL Reputation | Classification | 2396130 | 3231961 | 2009 |
p53 Mutants | Classification | 16772 | 5409 | 2010 |
Parkinsons Telemonitoring | Regression | 5875 | 26 | 2009 |
Demospongiae | Classification | 503 | 2010 | |
Opinosis Opinion ⁄ Review | 51 | 2010 | ||
Breast Tissue | Classification | 106 | 10 | 2010 |
Cardiotocography | Classification | 2126 | 23 | 2010 |
Wall-Following Robot Navigation Data | Classification | 5456 | 24 | 2010 |
Spoken Arabic Digit | Classification | 8800 | 13 | 2010 |
Localization Data for Person Activity | Classification | 164860 | 8 | 2010 |
AutoUniv | Classification | 2010 | ||
Steel Plates Faults | Classification | 1941 | 27 | 2010 |
MiniBooNE particle identification | Classification | 130065 | 50 | 2010 |
YearPredictionMSD | Regression | 515345 | 90 | 2011 |
PEMS-SF | Classification | 440 | 138672 | 2011 |
OpinRank Review Dataset | 2011 | |||
Relative location of CT slices on axial axis | Regression | 53500 | 386 | 2011 |
Online Handwritten Assamese Characters Dataset | Classification | 8235 | 2011 | |
PubChem Bioassay Data | Classification | 2011 | ||
Record Linkage Comparison Patterns | Classification | 5749132 | 12 | 2011 |
Communities and Crime Unnormalized | Regression | 2215 | 147 | 2011 |
Vertebral Column | Classification | 310 | 6 | 2011 |
EMG Physical Action Data Set | Classification | 10000 | 8 | 2011 |
Vicon Physical Action Data Set | Classification | 3000 | 27 | 2011 |
Amazon Commerce reviews set | Classification | 1500 | 10000 | 2011 |
Amazon Access Samples | Regression, Clustering, Causal-Discovery | 30000 | 20000 | 2011 |
Reuter_50_50 | Classification, Clustering | 2500 | 10000 | 2011 |
Farm Ads | Classification | 4143 | 54877 | 2011 |
DBWorld e-mails | Classification | 64 | 4702 | 2011 |
KEGG Metabolic Relation Network (Directed) | Classification, Regression, Clustering | 53414 | 24 | 2011 |
KEGG Metabolic Reaction Network (Undirected) | Classification, Regression, Clustering | 65554 | 29 | 2011 |
Bank Marketing | Classification | 45211 | 17 | 2012 |
YouTube Comedy Slam Preference Data | Classification | 1138562 | 3 | 2012 |
Gas Sensor Array Drift Dataset | Classification | 13910 | 128 | 2012 |
ILPD (Indian Liver Patient Dataset) | Classification | 583 | 10 | 2012 |
OPPORTUNITY Activity Recognition | Classification | 2551 | 242 | 2012 |
Nomao | Classification | 34465 | 120 | 2012 |
SMS Spam Collection | Classification, Clustering | 5574 | 2012 | |
Skin Segmentation | Classification | 245057 | 4 | 2012 |
Planning Relax | Classification | 182 | 13 | 2012 |
PAMAP2 Physical Activity Monitoring | Classification | 3850505 | 52 | 2012 |
Restaurant & consumer data | 138 | 47 | 2012 | |
CNAE-9 | Classification | 1080 | 857 | 2012 |
Individual household electric power consumption | Regression, Clustering | 2075259 | 9 | 2012 |
seeds | Classification, Clustering | 210 | 7 | 2012 |
Northix | Classification | 115 | 200 | 2012 |
QtyT40I10D100K | 3960456 | 4 | 2012 | |
Legal Case Reports | Classification | 2012 | ||
Human Activity Recognition Using Smartphones | Classification, Clustering | 10299 | 561 | 2012 |
One-hundred plant species leaves data set | Classification | 1600 | 64 | 2012 |
Energy efficiency | Classification, Regression | 768 | 8 | 2012 |
Yacht Hydrodynamics | Regression | 308 | 7 | 2013 |
Fertility | Classification, Regression | 100 | 10 | 2013 |
Daphnet Freezing of Gait | Classification | 237 | 9 | 2013 |
3D Road Network (North Jutland, Denmark) | Regression, Clustering | 434874 | 4 | 2013 |
ISTANBUL STOCK EXCHANGE | Classification, Regression | 536 | 8 | 2013 |
Buzz in social media | Regression, Classification | 140000 | 77 | 2013 |
First-order theorem proving | Classification | 6118 | 51 | 2013 |
Wearable Computing: Classification of Body Postures and Movements (PUC-Rio) | Classification | 165632 | 18 | 2013 |
Gas sensor arrays in open sampling settings | Classification | 18000 | 1950000 | 2013 |
Climate Model Simulation Crashes | Classification | 540 | 18 | 2013 |
MicroMass | Classification | 931 | 1300 | 2013 |
QSAR biodegradation | Classification | 1055 | 41 | 2013 |
BLOGGER | Classification | 100 | 6 | 2013 |
Daily and Sports Activities | Classification, Clustering | 9120 | 5625 | 2013 |
User Knowledge Modeling | Classification, Clustering | 403 | 5 | 2013 |
Reuters RCV1 RCV2 Multilingual, Multiview Text Categorization Test collection | Classification | 111740 | 2013 | |
NYSK | Clustering | 10421 | 7 | 2013 |
Turkiye Student Evaluation | Classification, Clustering | 5820 | 33 | 2013 |
ser Knowledge Modeling Data (Students' Knowledge Levels on DC Electrical Machines) | Classification | 403 | 5 | 2013 |
EEG Eye State | Classification | 14980 | 15 | 2013 |
Physicochemical Properties of Protein Tertiary Structure | Regression | 45730 | 9 | 2013 |
seismic-bumps | Classification | 2584 | 19 | 2013 |
banknote authentication | Classification | 1372 | 5 | 2013 |
USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder Problem: Pat | Classification | 306 | 5 | 2013 |
YouTube Multiview Video Games Dataset | Classification, Clustering | 120000 | 1000000 | 2013 |
Gas Sensor Array Drift Dataset at Different Concentrations | Classification, Regression, Clustering, Causa | 13910 | 129 | 2013 |
Activities of Daily Living (ADLs) Recognition Using Binary Sensors | Classification, Clustering | 2747 | 2013 | |
SkillCraft1 Master Table Dataset | Regression | 3395 | 20 | 2013 |
Weight Lifting Exercises monitored with Inertial Measurement Units | Classification | 39242 | 152 | 2013 |
SML2010 | Regression | 4137 | 24 | 2014 |
Bike Sharing Dataset | Regression | 17389 | 16 | 2013 |
Predict keywords activities in a online social media | 51 | 35 | 2013 | |
Thoracic Surgery Data | Classification | 470 | 17 | 2013 |
EMG dataset in Lower Limb | 132 | 5 | 2014 | |
SUSY | Classification | 5000000 | 18 | 2014 |
HIGGS | Classification | 11000000 | 28 | 2014 |
Qualitative_Bankruptcy | Classification | 250 | 7 | 2014 |
LSVT Voice Rehabilitation | Classification | 126 | 309 | 2014 |
Dataset for ADL Recognition with Wrist-worn Accelerometer | Classification, Clustering | 3 | 2014 | |
Wilt | Classification | 4889 | 6 | 2014 |
User Identification From Walking Activity | Classification, Clustering | 2014 | ||
Activity Recognition from Single Chest-Mounted Accelerometer | Classification, Clustering | 2014 | ||
Leaf | Classification | 340 | 16 | 2014 |
Dresses_Attribute_Sales | Classification, Clustering | 501 | 13 | 2014 |
Tamilnadu Electricity Board Hourly Readings | Classification, Regression, Clustering | 45781 | 5 | 2013 |
Airfoil Self-Noise | Regression | 1503 | 6 | 2014 |
Wholesale customers | Classification, Clustering | 440 | 8 | 2014 |
Twitter Data set for Arabic Sentiment Analysis | Classification | 2000 | 2 | 2014 |
Combined Cycle Power Plant | Regression | 9568 | 4 | 2014 |
Urban Land Cover | Classification | 168 | 148 | 2014 |
Diabetes 130-US hospitals for years 1999-2008 | Classification, Clustering | 100000 | 55 | 2014 |
Bach Choral Harmony | Classification | 5665 | 17 | 2014 |
StoneFlakes | Classification, Clustering, Causal-Discovery | 79 | 8 | 2014 |
Tennis Major Tournament Match Statistics | Classification, Regression, Clustering | 127 | 42 | 2014 |
Parkinson Speech Dataset with Multiple Types of Sound Recordings | Classification, Regression | 1040 | 26 | 2014 |
Gesture Phase Segmentation | Classification, Clustering | 9900 | 50 | 2014 |
Perfume Data | Classification, Clustering | 560 | 2 | 2014 |
BlogFeedback | Regression | 60021 | 281 | 2014 |
REALDISP Activity Recognition Dataset | Classification | 1419 | 120 | 2014 |
Newspaper and magazine images segmentation dataset | Classification | 101 | 2014 | |
AAAI 2014 Accepted Papers | Clustering | 399 | 6 | 2014 |
Gas sensor array under flow modulation | Classification, Regression | 58 | 120432 | 2014 |
Gas sensor array exposed to turbulent gas mixtures | Classification, Regression | 180 | 150000 | 2014 |
UJIIndoorLoc | Classification, Regression | 21048 | 529 | 2014 |
Sentence Classification | Classification | 2014 | ||
Dow Jones Index | Classification, Clustering | 750 | 16 | 2014 |
sEMG for Basic Hand movements | Classification | 3000 | 2500 | 2014 |
AAAI 2013 Accepted Papers | Clustering | 150 | 5 | 2014 |
Geographical Original of Music | Classification, Regression | 1059 | 68 | 2014 |
Condition Based Maintenance of Naval Propulsion Plants | Regression | 11934 | 16 | 2014 |
Grammatical Facial Expressions | Classification, Clustering | 27965 | 100 | 2014 |
NoisyOffice | Classification, Regression | 216 | 216 | 2015 |
MHEALTH Dataset | Classification | 120 | 23 | 2014 |
Student Performance | Classification, Regression | 649 | 33 | 2014 |
ElectricityLoadDiagrams20112014 | Regression, Clustering | 370 | 140256 | 2015 |
Gas sensor array under dynamic gas mixtures | Classification, Regression | 4178504 | 19 | 2015 |
microblogPCU | Classification, Causal-Discovery | 221579 | 20 | 2015 |
Firm-Teacher_Clave-Direction_Classification | Classification | 10800 | 20 | 2015 |
Dataset for Sensorless Drive Diagnosis | Classification | 58509 | 49 | 2015 |
TV News Channel Commercial Detection Dataset | Classification, Clustering | 129685 | 12 | 2015 |
Phishing Websites | Classification | 2456 | 30 | 2015 |
Greenhouse Gas Observing Network | Regression | 2921 | 5232 | 2015 |
Diabetic Retinopathy Debrecen Data Set | Classification | 1151 | 20 | 2014 |
HIV-1 protease cleavage | Classification | 6590 | 1 | 2015 |
Sentiment Labelled Sentences | Classification | 3000 | 2015 | |
Online News Popularity | Classification, Regression | 39797 | 61 | 2015 |
Forest type mapping | Classification | 326 | 27 | 2015 |
wiki4HE | Regression, Clustering, Causal-Discovery | 913 | 53 | 2015 |
Online Video Characteristics and Transcoding Time Dataset | Regression | 168286 | 11 | 2015 |
Chronic_Kidney_Disease | Classification | 400 | 25 | 2015 |
Machine Learning based ZZAlpha Ltd. Stock Recommendations 2012-2014 | Classification | 314080 | 0 | 2015 |
Folio | Classification, Clustering | 637 | 20 | 2015 |
Taxi Service Trajectory - Prediction Challenge, ECML PKDD 2015 | Clustering, Causal-Discovery | 1710671 | 9 | 2015 |
Cuff-Less Blood Pressure Estimation | Classification, Regression | 12000 | 3 | 2015 |
Smartphone-Based Recognition of Human Activities and Postural Transitions | Classification | 10929 | 561 | 2015 |
Mice Protein Expression | Classification, Clustering | 1080 | 82 | 2015 |
UJIIndoorLoc-Mag | Classification, Regression, Clustering | 40000 | 13 | 2015 |
Heterogeneity Activity Recognition | Classification, Clustering | 43930257 | 16 | 2015 |
Educational Process Mining (EPM): A Learning Analytics Data Set | Classification, Regression, Clustering | 230318 | 13 | 2015 |
HEPMASS | Classification | 10500000 | 28 | 2016 |
Indoor User Movement Prediction from RSS data | Classification | 13197 | 4 | 2016 |
Open University Learning Analytics dataset | Classification, Regression, Clustering | 2015 | ||
default of credit card clients | Classification | 30000 | 24 | 2016 |
Mesothelioma’s disease data set | Classification | 324 | 34 | 2016 |
Online Retail | Classification, Clustering | 541909 | 8 | 2015 |
SIFT10M | Causal-Discovery | 11164866 | 128 | 2016 |
GPS Trajectories | Classification, Regression | 163 | 15 | 2016 |
Detect Malacious Executable(AntiVirus) | Classification | 373 | 513 | 2016 |
Occupancy Detection | Classification | 20560 | 7 | 2016 |
Improved Spiral Test Using Digitized Graphics Tablet for Monitoring Parkinson’s Disease | Classification, Regression, Clustering | 40 | 7 | 2016 |
News Aggregator | Classification, Clustering | 422937 | 5 | 2016 |
Air Quality | Regression | 9358 | 15 | 2016 |
Twin gas sensor arrays | Classification, Regression | 640 | 480000 | 2016 |
Gas sensors for home activity monitoring | Classification | 919438 | 11 | 2016 |
Facebook Comment Volume Dataset | Regression | 40949 | 54 | 2016 |
Smartphone Dataset for Human Activity Recognition (HAR) in Ambient Assisted Living (AAL) | Classification | 5744 | 561 | 2016 |
Polish companies bankruptcy data | Classification | 10503 | 64 | 2016 |
Activity Recognition system based on Multisensor data fusion (AReM) | Classification | 42240 | 6 | 2016 |
Dota2 Games Results | Classification | 102944 | 116 | 2016 |
Facebook metrics | Regression | 500 | 19 | 2016 |
UbiqLog (smartphone lifelogging) | Causal-Discovery | 9782222 | 2016 | |
NIPS Conference Papers 1987-2015 | Clustering | 11463 | 5812 | 2016 |
HTRU2 | Classification, Clustering | 17898 | 9 | 2017 |
Drug consumption (quantified) | Classification | 1885 | 32 | 2016 |
Appliances energy prediction | Regression | 19735 | 29 | 2017 |
Miskolc IIS Hybrid IPS | Classification, Clustering, Causal-Discovery | 1540 | 67 | 2016 |
KDC-4007 dataset Collection | Classification, Regression | 4007 | 2017 | |
Geo-Magnetic field and WLAN dataset for indoor localisation from wristband and smartphone | Classification, Regression, Clustering | 153540 | 25 | 2017 |
DrivFace | Classification, Regression, Clustering | 606 | 6400 | 2016 |
Website Phishing | Classification | 1353 | 10 | 2016 |
YouTube Spam Collection | Classification | 1956 | 5 | 2017 |
Beijing PM2.5 Data | Regression | 43824 | 13 | 2017 |
Cargo 2000 Freight Tracking and Tracing | Classification, Regression | 3942 | 98 | 2016 |
Cervical cancer (Risk Factors) | Classification | 858 | 36 | 2017 |
Quality Assessment of Digital Colposcopies | Classification | 287 | 69 | 2017 |
KASANDR | Causal-Discovery | 17764280 | 2158859 | 2017 |
FMA: A Dataset For Music Analysis | Classification, Clustering | 106574 | 518 | 2017 |
Air quality | Regression | 9358 | 15 | 2016 |
Epileptic Seizure Recognition | Classification, Clustering | 11500 | 179 | 2017 |
Devanagari Handwritten Character Dataset | Classification | 92000 | 2016 | |
Stock portfolio performance | Regression | 315 | 12 | 2016 |
MoCap Hand Postures | Classification, Clustering | 78095 | 38 | 2016 |
Early biomarkers of Parkinson�s disease based on natural connected speech | Classification, Regression | 130 | 65 | 2017 |
Data for Software Engineering Teamwork Assessment in Education Setting | Classification | 74 | 102 | 2017 |
PM2.5 Data of Five Chinese Cities | Regression | 52854 | 86 | 2017 |
Parkinson Disease Spiral Drawings Using Digitized Graphics Tablet | Classification, Regression, Clustering | 77 | 7 | 2017 |
Sales_Transactions_Dataset_Weekly | Clustering | 811 | 53 | 2017 |
Las Vegas Strip | Classification, Regression | 504 | 20 | 2017 |
Eco-hotel | 401 | 1 | 2017 | |
MEU-Mobile KSD | Classification | 2856 | 71 | 2016 |
Crowdsourced Mapping | Classification | 10546 | 29 | 2016 |
gene expression cancer RNA-Seq | Classification, Clustering | 801 | 20531 | 2016 |
Hybrid Indoor Positioning Dataset from WiFi RSSI, Bluetooth and magnetometer | Classification | 1540 | 65 | 2016 |
chestnut – LARVIC | Classification, Clustering | 1451 | 3 | 2017 |
Burst Header Packet (BHP) flooding attack on Optical Burst Switching (OBS) Network | Classification | 1075 | 22 | 2017 |
Motion Capture Hand Postures | Classification, Clustering | 78095 | 38 | 2017 |
Anuran Calls (MFCCs) | Classification, Clustering | 7195 | 22 | 2017 |
TTC-3600: Benchmark dataset for Turkish text categorization | Classification, Clustering | 3600 | 4814 | 2017 |
Gastrointestinal Lesions in Regular Colonoscopy | Classification | 76 | 698 | 2016 |
Daily Demand Forecasting Orders | Regression | 60 | 13 | 2017 |
Paper Reviews | Classification, Regression | 405 | 10 | 2017 |
extention of Z-Alizadeh sani dataset | Classification | 303 | 59 | 2017 |
Z-Alizadeh Sani | Classification | 303 | 56 | 2017 |
Dynamic Features of VirusShare Executables | Classification, Regression | 107888 | 482 | 2017 |
IDA2016Challenge | Classification | 76000 | 171 | 2017 |
DSRC Vehicle Communications | Clustering | 10000 | 5 | 2017 |
Mturk User-Perceived Clusters over Images | Clustering | 180 | 500 | 2016 |
Character Font Images | Classification | 745000 | 411 | 2016 |
DeliciousMIL: A Data Set for Multi-Label Multi-Instance Learning with Instance Labels | Classification | 12234 | 8519 | 2016 |
Autistic Spectrum Disorder Screening Data for Children | Classification | 292 | 21 | 2017 |
Autistic Spectrum Disorder Screening Data for Adolescent | Classification | 104 | 21 | 2017 |
APS Failure at Scania Trucks | Classification | 60000 | 171 | 2017 |
Wireless Indoor Localization | Classification | 2000 | 7 | 2017 |
HCC Survival | Classification | 165 | 49 | 2017 |
CSM (Conventional and Social Media Movies) Dataset 2014 and 2015 | Classification, Regression | 217 | 12 | 2017 |
University of Tehran Question Dataset 2016 (UTQD.2016) | Classification | 1175 | 3 | 2017 |
Autism Screening Adult | Classification | 704 | 21 | 2017 |
Activity recognition with healthy older people using a batteryless wearable sensor | Classification | 75128 | 9 | 2016 |
Immunotherapy Dataset | Classification | 90 | 8 | 2018 |
Cryotherapy Dataset | Classification | 90 | 7 | 2018 |
OCT data & Color Fundus Images of Left & Right Eyes | Classification | 50 | 2 | 2016 |
Discrete Tone Image Dataset | Classification | 71 | 11 | 2018 |
News Popularity in Multiple Social Media Platforms | Regression | 93239 | 11 | 2018 |
Ultrasonic flowmeter diagnostics | Classification | 540 | 173 | 2018 |
ICMLA 2014 Accepted Papers Data Set | Classification, Clustering | 105 | 5 | 2018 |
BLE RSSI Dataset for Indoor localization and Navigation | Classification, Clustering | 6611 | 15 | 2018 |
Container Crane Controller Data Set | Classification, Regression | 15 | 3 | 2018 |
Residential Building Data Set | Regression | 372 | 105 | 2018 |
Health News in Twitter | Clustering | 58000 | 25000 | 2018 |
chipseq | Classification | 4960 | 2018 | |
SGEMM GPU kernel performance | Regression | 241600 | 18 | 2018 |
Repeat Consumption Matrices | Clustering | 130000 | 21000 | 2018 |
detection_of_IoT_botnet_attacks_N_BaIoT | Classification, Clustering | 7062606 | 115 | 2018 |
Absenteeism at work | Classification, Clustering | 740 | 21 | 2018 |
SCADI | Classification, Clustering | 70 | 206 | 2018 |
Condition monitoring of hydraulic systems | Classification, Regression | 2205 | 43680 | 2018 |
Carbon Nanotubes | Regression | 10721 | 8 | 2018 |
Optical Interconnection Network | Classification, Regression | 640 | 10 | 2018 |
Sports articles for objectivity analysis | Classification | 1000 | 59 | 2018 |
Breast Cancer Coimbra | Classification | 116 | 10 | 2018 |
GNFUV Unmanned Surface Vehicles Sensor Data | Regression | 1672 | 5 | 2018 |
Dishonest Internet users Dataset | Classification, Clustering | 322 | 5 | 2018 |
Victorian Era Authorship Attribution | Classification | 93600 | 1000 | 2018 |
Simulated Falls and Daily Living Activities Data Set | Classification | 3060 | 138 | 2018 |
Multimodal Damage Identification for Humanitarian Computing | Classification | 5879 | 2018 | |
EEG Steady-State Visual Evoked Potential Signals | Classification, Regression | 9200 | 16 | 2018 |
Roman Urdu Data Set | Classification | 20000 | 2 | 2018 |
Avila | Classification | 20867 | 10 | 2018 |
PANDOR | Recommendation | 2018 | ||
Drug Review Dataset (Druglib.com) | Classification, Regression, Clustering | 4143 | 8 | 2018 |
Drug Review Dataset (Drugs.com) | Classification, Regression, Clustering | 215063 | 6 | 2018 |
Physical Unclonable Functions | Classification | 6000000 | 129 | 2018 |
Superconductivty Data | Regression | 21263 | 81 | 2018 |
WESAD (Wearable Stress and Affect Detection) | Classification, Regression | 63000000 | 12 | 2018 |
GNFUV Unmanned Surface Vehicles Sensor Data Set 2 | Regression | 10190 | 6 | 2018 |
Student Academics Performance | Classification | 300 | 22 | 2018 |
Online Shoppers Purchasing Intention Dataset | Classification, Clustering | 12330 | 18 | 2018 |
PMU-UD | Classification | 5180 | 9 | 2018 |
Parkinson's Disease Classification | Classification | 756 | 754 | 2018 |
Electrical Grid Stability Simulated Data | Classification, Regression | 10000 | 14 | 2018 |
Caesarian Section Classification Dataset | Classification | 80 | 5 | 2018 |
BAUM-1 | Classification | 1184 | 2018 | |
BAUM-2 | Classification | 1047 | 2018 | |
Audit Data | Classification | 777 | 18 | 2018 |
BuddyMove Data Set | Classification, Clustering | 249 | 7 | 2018 |
Real estate valuation data set | Regression | 414 | 7 | 2018 |
Early biomarkers of Parkinson’s disease based on natural connected speech Data Set | Classification | 2018 | ||
Somerville Happiness Survey | Classification | 143 | 7 | 2018 |
2.4 GHZ Indoor Channel Measurements | Classification | 7840 | 5 | 2018 |
EMG data for gestures | Classification | 30000 | 6 | 2019 |
Parking Birmingham | Classification, Regression, Clustering | 35717 | 4 | 2019 |
Behavior of the urban traffic of the city of Sao Paulo in Brazil | Classification, Regression | 135 | 18 | 2018 |
Travel Reviews | Classification, Clustering | 980 | 11 | 2018 |
Tarvel Review Ratings | Classification, Clustering | 5456 | 25 | 2018 |
Rice Leaf Diseases | Classification | 120 | 2019 | |
Gas sensor array temperature modulation | Classification, Regression | 4095000 | 20 | 2019 |
Facebook Live Sellers in Thailand | Clustering | 7051 | 12 | 2019 |
Parkinson Dataset with replicated acoustic features | Classification | 240 | 46 | 2019 |
Metro Interstate Traffic Volume | Regression | 48204 | 9 | 2019 |
Query Analytics Workloads Dataset | Regression, Clustering | 260000 | 8 | 2019 |
Wave Energy Converters | Regression | 288000 | 49 | 2019 |
PPG-DaLiA | Regression | 8300000 | 11 | 2019 |
Alcohol QCM Sensor Dataset | Classification, Regression, Clustering | 125 | 8 | 2019 |
Divorce Predictors data set | Classification | 170 | 54 | 2019 |
Incident management process enriched event log | Regression, Clustering | 141712 | 36 | 2019 |
Opinion Corpus for Lebanese Arabic Reviews (OCLAR) | Classification | 3916 | 3916 | 2019 |
MEx | Classification, Clustering | 6262 | 710 | 2019 |
Beijing Multi-Site Air-Quality Data | Regression | 420768 | 18 | 2019 |
Online Retail II | Classification, Regression, Clustering | 1067371 | 8 | 2019 |
Hepatitis C Virus (HCV) for Egyptian patients | Classification | 1385 | 29 | 2019 |
QSAR fish toxicity | Regression | 908 | 7 | 2019 |
QSAR aquatic toxicity | Regression | 546 | 9 | 2019 |
Human Activity Recognition from Continuous Ambient Sensor Data | Classification | 13956534 | 37 | 2019 |
WISDM Smartphone and Smartwatch Activity and Biometrics Dataset | Classification | 15630426 | 6 | 2019 |
QSAR oral toxicity | Classification | 8992 | 1024 | 2019 |
QSAR androgen receptor | Classification | 1687 | 1024 | 2019 |
QSAR Bioconcentration classes dataset | Classification, Regression | 779 | 14 | 2019 |
QSAR fish bioconcentration factor (BCF) | Regression | 1056 | 7 | 2019 |
A study of Asian Religious and Biblical Texts | Classification, Clustering | 590 | 8265 | 2019 |
Real-time Election Results: Portugal 2019 | Regression | 21643 | 29 | 2019 |
Bias correction of numerical prediction model temperature forecast | Regression | 7750 | 25 | 2020 |
Bar Crawl: Detecting Heavy Drinking | Classification, Regression | 14057567 | 3 | 2020 |
Kitsune Network Attack Dataset | Classification, Clustering, Causal-Discovery | 27170754 | 115 | 2019 |
Shoulder Implant X-Ray Manufacturer Classification | Classification | 597 | 1 | 2020 |
Speaker Accent Recognition | Classification | 329 | 12 | 2020 |
Heart failure clinical records | Classification, Regression, Clustering | 299 | 13 | 2020 |
Deepfakes: Medical Image Tamper Detection | Classification | 20000 | 200000 | 2020 |
selfBACK | Classification, Clustering | 26136 | 6 | 2020 |
South German Credit | Classification, Regression, Clustering | 1000 | 21 | 2019 |
Exasens | Classification, Clustering | 399 | 4 | 2020 |
Swarm Behaviour | Classification | 24017 | 2400 | 2020 |
Crop mapping using fused optical-radar data set | Classification | 325834 | 175 | 2020 |
BitcoinHeistRansomwareAddressDataset | Classification, Clustering | 2916697 | 10 | 2020 |
Facebook Large Page-Page Network | Classification | 22470 | 4714 | 2020 |
Amphibians | Classification | 189 | 23 | 2020 |
Early stage diabetes risk prediction dataset. | Classification | 520 | 17 | 2020 |
Turkish Spam V01 | Classification | 826 | 2 | 2019 |
Stock keeping units | Clustering | 2279 | 9 | 2019 |
Demand Forecasting for a store | Regression | 28764 | 8 | 2019 |
Detect Malware Types | Classification | 7107 | 280 | 2019 |
Wave Energy Converters | Regression | 288000 | 49 | 2019 |
Youtube cookery channels viewers comments in Hinglish | Classification | 9800 | 3 | 2019 |
Pedestrian in Traffic Dataset | Classification, Regression, Causal-Discovery | 4760 | 14 | 2019 |
Cervical Cancer Behavior Risk | Classification, Clustering | 72 | 19 | 2019 |
Sattriya_Dance_Single_Hand_Gestures Dataset | Classification | 1450 | 2019 | |
Divorce Predictors data set | Classification | 170 | 54 | 2019 |
3W dataset | Classification, Clustering | 1984 | 8 | 2019 |
Malware static and dynamic features VxHeaven and Virus Total | Classification | 2955 | 1087 | 2019 |
Internet Firewall Data | Classification | 65532 | 12 | 2019 |
User Profiling and Abusive Language Detection Dataset | Classification | 65919 | 3 | 2019 |
Estimation of obesity levels based on eating habits and physical condition | Classification, Regression, Clustering | 2111 | 17 | 2019 |
Rice (Cammeo and Osmancik) | Classification | 3810 | 8 | 2019 |
Vehicle routing and scheduling problems | Clustering | 18 | 9 | 2019 |
Algerian Forest Fires Dataset | Classification, Regression | 244 | 12 | 2019 |
Breath Metabolomics | Classification, Clustering | 104 | 1656 | 2019 |
Horton General Hospital | Causal-Discovery | 139 | 6 | 2019 |
UrbanGB, urban road accidents coordinates labelled by the urban center | Clustering | 360177 | 2 | 2019 |
Gas Turbine CO and NOx Emission Data Set | Regression, Clustering | 36733 | 11 | 2019 |
Activity recognition using wearable physiological measurements | Classification | 4480 | 533 | 2019 |
clickstream data for online shopping | Classification, Regression, Clustering | 165474 | 14 | 2019 |
CNNpred: CNN-based stock market prediction using a diverse set of variables | Classification, Regression | 1985 | 84 | 2019 |
Apartment for rent classified | Classification, Regression, Clustering | 10000 | 22 | 2019 |
: Simulated Data set of Iraqi tourism places | Classification, Clustering | 232 | 16 | 2020 |
Nasarian CAD Dataset | Classification | 150 | 52 | 2020 |
Monolithic Columns in Troad and Mysia Region | Classification | 11 | 19 | 2020 |
Bar Crawl: Detecting Heavy Drinking | Classification, Regression | 14057567 | 3 | 2020 |
Seoul Bike Sharing Demand | Regression | 8760 | 14 | 2020 |
Person Classification Gait Data | Classification | 48 | 321 | 2020 |
Shill Bidding Dataset | Classification, Clustering | 6321 | 13 | 2020 |
Iranian Churn Dataset | Classification, Regression | 3150 | 13 | 2020 |
Unmanned Aerial Vehicle (UAV) Intrusion Detection | Classification | 17256 | 55 | 2020 |
Bone marrow transplant: children | Classification, Regression | 187 | 39 | 2020 |
Exasens | Classification, Clustering | 399 | 4 | 2020 |
COVID-19 Surveillance | Classification | 14 | 7 | 2020 |
Refractive errors | Classification | 467 | 79 | 2020 |
Shoulder Implant X-Ray Manufacturer Classification | Classification | 597 | 1 | 2020 |
CLINC150 | Classification | 23700 | 2020 | |
HCV data | Classification, Clustering | 615 | 14 | 2020 |
Taiwanese Bankruptcy Prediction | Classification | 6819 | 96 | 2020 |
South German Credit (UPDATE) | Classification, Regression, Clustering | 1000 | 21 | 2020 |
IIWA14-R820-Gazebo-Dataset-10Trajectories | Regression | 2020 | ||
Guitar Chords finger positions | Classification | 2633 | 5 | 2020 |
Russian Corpus of Biographical Texts | Classification | 200 | 2 | 2020 |
Codon usage | Classification, Clustering | 13028 | 69 | 2020 |
Intelligent Media Accelerometer and Gyroscope (IM-AccGyro) Dataset | Classification | 800 | 9 | 2020 |
Myocardial infarction complications | Classification | 1700 | 124 | 2020 |
Hungarian Chickenpox Cases | Regression | 521 | 20 | 2021 |
Simulated data for survival modelling | Regression | 120000 | 25 | 2018 |
Student Performance on an entrance examination | Classification | 666 | 11 | 2018 |
Chemical Composition of Ceramic Samples | Classification, Clustering | 88 | 19 | 2019 |
Labeled Text Forum Threads Dataset | Classification | 200 | 9 | 2019 |
Stock keeping units | Clustering | 2279 | 9 | 2019 |
BLE RSSI dataset for Indoor localization | Classification | 23570 | 5 | 2019 |
Basketball dataset | Classification | 10000 | 7 | 2019 |
GitHub MUSAE | Classification | 37700 | 4006 | 2019 |
Anticancer peptides | Classification | 1850 | 2 | 2019 |
Monolithic Columns in Troad and Mysia Region | Classification | 11 | 19 | 2020 |
Gender by Name | Classification, Clustering | 147270 | 4 | 2020 |
Iranian Churn Dataset | Classification, Regression | 3150 | 13 | 2020 |
Unmanned Aerial Vehicle (UAV) Intrusion Detection | Classification | 17256 | 55 | 2020 |
Shoulder Implant Manufacture Classification | Classification | 597 | 1 | 2020 |
LastFM Asia Social Network | Classification | 7624 | 7842 | 2020 |
Wheat kernels | Classification | 314 | 15 | 2020 |
Productivity Prediction of Garment Employees | Classification, Regression | 1197 | 15 | 2020 |
Multi-view Brain Networks | Classification, Clustering | 70 | 70 | 2020 |
LastFM Asia Social Network | Classification | 7624 | 7842 | 2020 |
Wisesight Sentiment Corpus | Classification | 26737 | 4 | 2020 |
AI4I 2020 Predictive Maintenance Dataset | Classification, Regression, Causal-Discovery | 10000 | 14 | 2020 |
Dry Bean Dataset | Classification | 13611 | 17 | 2020 |
in-vehicle coupon recommendation | Classification | 12684 | 23 | 2020 |
Gait Classification | Classification | 48 | 321 | 2020 |
Wikipedia Math Essentials | Regression | 731 | 1068 | 2021 |
Wikipedia Math Essentials | Regression | 731 | 1068 | 2021 |
Synchronous Machine Data Set | Regression | 557 | 5 | 2021 |
Average Localization Error (ALE) in sensor node localization process in WSNs | Regression | 107 | 6 | 2021 |
9mers from cullpdb | Classification, Regression | 158716 | 4 | 2021 |
TamilSentiMix | Classification | 15744 | 2021 | |
Accelerometer | Classification, Regression | 153000 | 5 | 2021 |
Synchronous Machine Data Set | Regression | 557 | 5 | 2021 |
Pedal Me Bicycle Deliveries | Regression | 36 | 15 | 2021 |
Turkish Headlines Dataset | Classification, Clustering | 4200 | 7 | 2021 |
Secondary Mushroom Dataset | Classification | 61069 | 21 | 2021 |
Power consumption of Tetouan city | Regression | 52417 | 9 | 2021 |
Raisin Dataset | Classification | 900 | 8 | 2021 |
Steel Industry Energy Consumption Dataset | Regression | 35040 | 11 | 2021 |
Gender Gap in Spanish WP | Classification | 4746 | 21 | 2021 |
Non verbal tourists data | Classification, Clustering | 73 | 22 | 2021 |
Roman Urdu Sentiment Analysis Dataset (RUSAD) | Classification | 11000 | 2 | 2021 |
TUANDROMD ( Tezpur University Android Malware Dataset) | Classification | 4465 | 241 | 2021 |
Higher Education Students Performance Evaluation Dataset | Classification | 145 | 33 | 2021 |
Risk Factor prediction of Chronic Kidney Disease | Classification, Regression | 202 | 29 | 2021 |
Lab Test | Classification, Regression, Clustering | 221 | 7 | 2021 |
Shoulder Implant Manufacture Classification | Classification | 597 | 1 | 2020 |
Rocket League Skillshots Data Set | Classification | 298 | 2020 | |
Sepsis survival minimal clinical records | Classification | 110341 | 4 | 2020 |
Water Quality Prediction | Regression | 705 | 11 | 2020 |
Traffic Flow Forecasting | Regression | 2101 | 47 | 2020 |
sentiment analysis in Saudi Arabia about distance education during Covid-19 | Classification | 1765 | 10 | 2020 |
Kain Tradisional Sambas | Classification, Clustering | 150 | 3 | 2020 |
Image Recognition Task Execution Times in Mobile Edge Computing | Regression | 4000 | 2 | 2020 |
REWEMA | Classification | 6272 | 632 | 2020 |
REJAFADA | Classification | 1996 | 6826 | 2020 |
Steel Industry Energy Consumption Dataset | Regression | 35040 | 11 | 2020 |
Influenza outbreak event prediction via Twitter data | Classification | 75840 | 525 | 2020 |
Turkish Music Emotion Dataset | Classification | 400 | 50 | 2020 |
Maternal Health Risk Data Set | Classification | 1014 | 7 | 2020 |
Room Occupancy Estimation | Classification | 10129 | 16 | 2021 |
Image Recognition Task Execution Times in Mobile Edge Computing | Regression | 4000 | 2 | 2021 |
4个回复
-
xsmile
推荐系统
Book-Crossing Dataset (uni-freiburg.de)
Book-Crossing Dataset ... mined by Cai-Nicolas Ziegler, DBIS Freiburg
Collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. Contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.
[ ! ] Freely available for research use when acknowledged with the following reference (further details on the dataset are given in this publication):
- Improving Recommendation Lists Through Topic Diversification,
Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen; Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To appear.
Download: [ PDF Pre-Print ]
As a courtesy, if you use the data, I would appreciate knowing your name, what research group you are in, and the publications that may result.
Format
The Book-Crossing dataset comprises 3 tables.
-
-
BX-Users
Contains the users. Note that user IDs (`User-ID`) have been anonymized and map to integers. Demographic data is provided (`Location`, `Age`) if available. Otherwise, these fields contain NULL-values.
-
-
-
BX-Books
Books are identified by their respective ISBN. Invalid ISBNs have already been removed from the dataset. Moreover, some content-based information is given (`Book-Title`, `Book-Author`, `Year-Of-Publication`, `Publisher`), obtained from Amazon Web Services. Note that in case of several authors, only the first is provided. URLs linking to cover images are also given, appearing in three different flavours (`Image-URL-S`, `Image-URL-M`, `Image-URL-L`), i.e., small, medium, large. These URLs point to the Amazon web site.
-
-
-
BX-Book-Ratings
Contains the book rating information. Ratings (`Book-Rating`) are either explicit, expressed on a scale from 1-10 (higher values denoting higher appreciation), or implicit, expressed by 0.
-
Download
Various data format flavours are available. Note that all downloads are in .ZIP format
-
-
Contains both schema information and data insertion statements. More convenient to use. Run as an SQL-script.
-
-
-
Data as comma-separated values (CSV). The first line contains column names. Field separators are given by semicola, all entries are in quotes.
-
Other Datasets
-
-
Offers collaborative filtering (CF) datasets for movies. MovieLens datasets come in different sizes. Also links to the older EachMovie dataset that can be obtained upon request from Compaq.
-
-
-
Dense dataset for joke recommending. Large numbers of users, but small number of items (around 100 jokes) only.
-
1年前 我来评论 - Improving Recommendation Lists Through Topic Diversification,
-
xsmile
经常用到数据分析常用的数据集,收集挺麻烦的。取之于网络,还之于人民
数据集名称 下载地址 数据集介绍 天池二手车价格预测 链接:https://pan.baidu.com/s/1n3qRxNhmmUMugkYiFSTKQQ 提取码:66ri 用户对品类下店铺的购买预测 链接:https://pan.baidu.com/s/1i0rueEFNFRPWOJU84qFa-g 提取码:f5s5 2019JDATA用户对品类下店铺的购买预测 鸢尾花数据集 链接:https://pan.baidu.com/s/1ifmOH-yv_OKrMVqZnkpUWA 提取码:hdxn 波士顿房价数据集 链接:https://pan.baidu.com/s/137GqU1s4ba03Rl-BLboXMg 提取码:mp16 泰坦尼克生存数据集 链接:https://pan.baidu.com/s/1F5eNI5c9YpJ7mkdq8r5v_w 提取码:bf3u Amazon Employee Access 数据 链接:https://pan.baidu.com/s/1I3P3bx3ZFxN-MI7ZnYqD_A 提取码:12a6 利用Amazon的员工编号相关信息,来分析和预测当员工申请访问某个编号的资源时,是否被允许访问。 credit_card数据集 链接:https://pan.baidu.com/s/1SjJJkXwaytQzgMNIgFnWbg 提取码:u20n 银行卡信用问题,根据已有的30个的特征及class进行分类,判断为正常或异常情况 电影评论情感分析 链接:https://pan.baidu.com/s/1N0EQaF2LXI8KOs4uLZNxeQ 提取码:29kq 研究生录取数据集 链接:https://pan.baidu.com/s/1dGSC2ARrNB0HELx4z6iGNw 提取码:r8zy 在申请的研究生的时候,什么样的学生更容易被录取 from https://www.cnblogs.com/duoba/p/12404774.html1年前 我来评论