hexis-mod-en

Model type: bert

Downstream task: binary text classification

Finetuning Emissions: 0.48511694 CO₂-equivalents [CO₂eq], in kg


Benchmarks

Accuracy: 0.9701423713182316
F1: 0.9820562560620757

Accuracy: 0.8980431083380601
F1: 0.9173373189238906

Accuracy: 0.9854344956402267
F1: 0.928432756794917

  • Self-consistency
Accuracy: 0.9991328494500875
F1: 0.9980915196456928


Notes
  • Self-consistency test: Evaluation on all training data.
  • Train-test splits: If the dataset is not divided into train and test portions, a 70-30 train-test split is performed.

Debiasing

Test terms from StereoSet as contained in the training data. Showing the difference in Attention Entropy before and after Optimization of Information Flow. Unit: Entropy Bits

sheriff

0.001230363884272479

sir

0.0007713421891196629

priest

0.0006714202689683754

detective

0.0002851472351305242

broker

0.00026382138642353685

writer

0.00022288951551819022

her

0.00017682109610698914

bride

0.0001737024773714291

groom

0.0001726705814662523

chef

0.00014377749612130172

drawer

8.117581120724211e-05

journalist

7.085685215547405e-05

author

5.1250829957114725e-05

boyfriend

4.74672116381331e-05

poet

2.889308534495058e-05

grandmother

1.909007424577092e-05

waiter

1.7886195689731314e-05

umpire

3.439653017256022e-07

stepmother

-2.854912004322498e-05

female

-5.847410129335237e-05

boy

-7.223271336237646e-05

pianist

-0.00013070681465572883

he

-0.0001382605624584936

gentleman

-0.00016097576120758183

she

-0.0001723266161645267

him

-0.00017582359673249743

professor

-0.00019881194439739806

judge

-0.00020981883405261732

lady

-0.00022701709913889743

man

-0.00023630416228548868

his

-0.0002394981258020613

daughter

-0.00025900587219937845

dentist

-0.00026003776810455524

stepfather

-0.00026072569870800645

secretary

-0.00027654810258738417

carpenter

-0.00032126359181171245

dancer

-0.0003783618318981624

illustrator

-0.0003790497625016136

director

-0.00038799286034647926

architect

-0.00040243940301895457

husband

-0.0004077708651957014

father

-0.000412414396768997

church

-0.0004425113606699872

swimmer

-0.0004908384855624343

woman

-0.0004999191695281184

sociologist

-0.0005097565771573424

girlfriend

-0.0005243751024806805

boxer

-0.000568918609054146

photographer

-0.000577517741597286

coach

-0.0005916203189680357

clerk

-0.0006452789060372297

librarian

-0.0006500944202613881

army

-0.0006633944119285385

son

-0.0006762357831925339

doctor

-0.0006922874972734891

accountant

-0.0007092564521581917

wife

-0.0007182568775531314

comedian

-0.0007188874806065086

scientist

-0.0007274866131496486

editor

-0.0007666986575463673

girl

-0.0007780495125033121

chief

-0.0008385874056070181

athlete

-0.0008833028948313464

lawyer

-0.0009266425228487723

diplomat

-0.0009465925103488571

academic

-0.0009555356081937228

ladies

-0.0009617269836247837

painter

-0.0009992192015128743

singer

-0.001000251097418051

barber

-0.0010148696227413891

designer

-0.001084178631039098

teacher

-0.0011737815921386175

guard

-0.001178425123711913

analyst

-0.0013621025948333846

receptionist

-0.0013648543172471894

theologian

-0.0014621964976355348

cleaner

-0.0014701076995752237

counselor

-0.0014756111444028334

supervisor

-0.001493153374790839

artist

-0.0017862118118610521

daddy

-0.0024623329399527973

banker

-0.0025828354506575468

housekeeper

-0.002636150072425015


Notes

  • Higher entropy scores indicate an increase of Information Flow between the layers of the neural network. This also increases the interrelatedness between a term and its surroundings and reduces overfitting.

  • Higher is not automatically better: Depending on the base model and task specific training data, optimization at training time has equally valid reasons for reducing entropy scores.