hexis-mod-fr

Model type: bert

Downstream task: binary text classification

Finetuning Emissions: 0.01658717 CO₂-equivalents [CO₂eq], in kg


Benchmarks

Accuracy: 0.9569219440353461
F1: 0.841248303934871

Accuracy: 0.9945892769306444
F1: 0.9945839487936977

Accuracy: 0.9191251271617498
F1: 0.9515096065873742

  • Self-consistency
Accuracy: 0.999791492910759
F1: 0.999767549976755


Notes
  • Self-consistency test: Evaluation on all training data.
  • Train-test splits: If the dataset is not divided into train and test portions, a 70-30 train-test split is performed.

Debiasing

Test terms from StereoSet as contained in the training data. Showing the difference in Attention Entropy before and after Optimization of Information Flow. Unit: Entropy Bits

Chine

0.00016854299784554506

mari

0.00011884001174619555

garçon

0.0001085783802451422

Indien

0.00010043786810387584

Italien

9.011890905210777e-05

frère

7.636029698308368e-05

homme

5.967797984939198e-05

église

4.976031365006424e-05

Chili

4.953100344848671e-05

lui

1.2382750862121678e-05

chef

0.0

Islam

0.0

fille

-5.159479525884033e-06

père

-9.631028448316861e-06

professeur

-1.1694820258670474e-05

femme

-1.803360939065391e-05

il

-1.927297643065235e-05

fils

-2.6599983333019446e-05

médecin

-2.889308534495058e-05

conseiller

-2.9237050646676185e-05

auteur

-2.9581015948401786e-05

Français

-3.353661691824621e-05

armée

-4.488747187519108e-05

soldat

-7.0398231752319e-05

Venezuela

-7.704822758653489e-05

Allemagne

-7.921029519664932e-05

elle

-8.255167241414452e-05

son

-9.45904579745406e-05

Afrique

-0.00011048165491400714

Japon

-0.00011316458426772311

juge

-0.0001456119777300778

France

-0.00014789185030683434

Kenya

-0.0001827602303177242

Ghana

-0.00018711712413872758

Mohammed

-0.0002584325966960753

Guatemala

-0.000283599391272759

Paraguay

-0.0003363980650876389

Qatar

-0.0006686685465545707


Notes

  • Higher entropy scores indicate an increase of Information Flow between the layers of the neural network. This also increases the interrelatedness between a term and its surroundings and reduces overfitting.

  • Higher is not automatically better: Depending on the base model and task specific training data, optimization at training time has equally valid reasons for reducing entropy scores.