In the 2018 PLoS Biology paper, we found that there is a striking shortage of women in many fields of science and medicine. Inspired by earlier studies that manually counted the number of men and women publishing in scientific journals, we developed computational approaches to infer the gender of 36 million authors publishing articles indexed on PubMed and ar$\chi$iv over recent decades. By applying statistical models, we projected when the gender gap will close for dozens of fields of research. The results were bleak: many fields will take decades or centuries to reach gender parity, assuming that the current (slow) rate of progress continues. I have a couple more projects in the works on the massive dataset generated by the project, and I need some help! Drop me a line if you’d like to discuss a collaboration.
We made an interactive web app to help you fully explore the data. All the data (e.g. the gender ratio breakdown for thousands of journals, disciplines, and countries; data on the 36m individual papers) is freely available here, and I’d be very happy if it proves useful to someone.
I have also written a follow-up paper with Dr. Claire Morandin, using this dataset to look at patterns of collaboration within and between genders. We found that researchers have strong tendency to collaborate with same-gendered colleagues, in essentially all fields of research. Interestingly, we found a weak negative correlation between journal impact factor and the excess of same-gender coauthorships, as predicted if mixed-gender teams produce better quality research.