Project description:Many studies show that open access (OA) articles-articles from scholarly journals made freely available to readers without requiring subscription fees-are downloaded, and presumably read, more often than closed access/subscription-only articles. Assertions that OA articles are also cited more often generate more controversy. Confounding factors (authors may self-select only the best articles to make OA; absence of an appropriate control group of non-OA articles with which to compare citation figures; conflation of pre-publication vs. published/publisher versions of articles, etc.) make demonstrating a real citation difference difficult. This study addresses those factors and shows that an open access citation advantage as high as 19% exists, even when articles are embargoed during some or all of their prime citation years. Not surprisingly, better (defined as above median) articles gain more when made OA.
Project description:Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
Project description:Efforts to make research results open and reproducible are increasingly reflected by journal policies encouraging or mandating authors to provide data availability statements. As a consequence of this, there has been a strong uptake of data availability statements in recent literature. Nevertheless, it is still unclear what proportion of these statements actually contain well-formed links to data, for example via a URL or permanent identifier, and if there is an added value in providing such links. We consider 531, 889 journal articles published by PLOS and BMC, develop an automatic system for labelling their data availability statements according to four categories based on their content and the type of data availability they display, and finally analyze the citation advantage of different statement categories via regression. We find that, following mandated publisher policies, data availability statements become very common. In 2018 93.7% of 21,793 PLOS articles and 88.2% of 31,956 BMC articles had data availability statements. Data availability statements containing a link to data in a repository-rather than being available on request or included as supporting information files-are a fraction of the total. In 2017 and 2018, 20.8% of PLOS publications and 12.2% of BMC publications provided DAS containing a link to data in a repository. We also find an association between articles that include statements that link to data in a repository and up to 25.36% (± 1.07%) higher citation impact on average, using a citation prediction model. We discuss the potential implications of these results for authors (researchers) and journal publishers who make the effort of sharing their data in repositories. All our data and code are made available in order to reproduce and extend our results.
Project description:Rhesus factor polymorphism has been an evolutionary enigma since its discovery in 1939. Carriers of the rarer allele should be eliminated by selection against Rhesus positive children born to Rhesus negative mothers. Here I used an ecologic regression study to test the hypothesis that Rhesus factor polymorphism is stabilized by heterozygote advantage. The study was performed in 65 countries for which the frequencies of RhD phenotypes and specific disease burden data were available. I performed multiple multivariate covariance analysis with five potential confounding variables: GDP, latitude (distance from the equator), humidity, medical care expenditure per capita and frequencies of smokers. The results showed that the burden associated with many diseases correlated with the frequencies of particular Rhesus genotypes in a country and that the direction of the relation was nearly always the opposite for the frequency of Rhesus negative homozygotes and that of Rhesus positive heterozygotes. On the population level, a Rhesus-negativity-associated burden could be compensated for by the heterozygote advantage, but for Rhesus negative subjects this burden represents a serious problem.
Project description:Cumulative advantage-commonly known as the Matthew Effect-influences academic output and careers. Given the challenge and uncertainty of gauging the quality of academic research, gatekeepers often possess incentives to prefer the work of established academics. Such preferences breach scientific norms of universalism and can stifle innovation. This article analyzes repeat authors within academic journals as a possible exemplar of the Matthew Effect. Using publication data for 347 economics journals from 1980-2017, as well as from three major generalist science journals, we analyze how articles written by repeat authors fare vis-à-vis less-experienced authors. Results show that articles written by repeat authors steadily decline in citation impact with each additional repeat authorship. Despite these declines, repeat authors also tend to garner more citations than debut authors. These contrasting results suggest both benefits and drawbacks associated with repeat authorships. Journals appear to respond to feedback from previous publications, as more-cited authors in a journal are more likely to be selected for repeat authorships. Institutional characteristics of journals also affect the likelihood of repeat authorship, as well as citation outcomes. Repeat authorships-particularly in leading academic journals-reflect innovative incentives and professional reward structures, while also influencing the intellectual content of science.
Project description:AimsOver the last two decades, the existence of an open access citation advantage (OACA)-increased citation of articles made available open access (OA)-has been the topic of much discussion. While there has been substantial research to address this question, findings have been contradictory and inconclusive. We conducted a systematic review to compare studies of citations to OA and non-OA articles.MethodsA systematic search of 17 databases attempted to capture all relevant studies authored since 2001. The protocol was registered in Open Science Framework. We included studies with a direct comparison between OA and non-OA items and reported article-level citation as an outcome. Both randomized and non-randomized studies were included. No limitations were placed on study design, language, or publication type.ResultsA total of 5,744 items were retrieved. Ultimately, 134 items were identified for inclusion. 64 studies (47.8%) confirmed the existence of OACA, while 37 (27.6%) found that it did not exist, 32 (23.9%) found OACA only in subsets of their sample, and 1 study (0.8%) was inconclusive. Studies with a focus on multiple disciplines were significantly positively associated with finding that OACA exists in subsets, and are less associated with finding that OACA did not exist. In the critical appraisal of the included studies, 3 were found to have an overall low risk of bias. Of these, one found that an OACA existed, one found that it did not, and one found that an OACA occurred in subsets.ConclusionsAs seen through the large number of studies identified for this review, OACA is a topic of continuing interest. Quality and heterogeneity of the component studies pose challenges for generalization. The results suggest the need for reporting guidelines for bibliometrics studies.
Project description:Cuba and the U.S. have the oldest Academies of Sciences outside Europe. Both countries have a long history of scientific collaboration that dates to the 1800s. Both scientific communities also share geographical proximity and common scientific research interests mainly in Biotechnology, Meteorology, and Public Health research. Despite these facts, scientists from both nations face serious barriers to cooperation raised by the U.S. embargo established in 1961 that prohibits exchanges with Cuba. The study aims to analyze the effects of U.S. policy on scientific collaboration with Cuban scientific institutions. The results of the bibliometric analysis of Cuba-U.S. joint publications in the Web of Science, and Scopus databases between 1980 to 2020 indicate sustained growth of scientific collaboration between scientists of both nations over the past forty years. The results also show that after the 1980 Smithsonian Institution and the Cuba's Academy of Sciences agreement significantly increased scientific collaboration between U.S. scientists with their Cuban peers. President Barack Obama's approach to normalizing the U.S. Cuba relations in 2015 enhanced Cuban scientific production with U.S. scientists by exceeding the number of collaborative papers published during any preceding U.S. Presidential administration. By 2020, Cuba had expanded its scientific links to 80% of the countries in the world. Cuban and U.S. scientists converted from adversaries into partners, showing that science is an effective diplomatic channel. A particularly important question for the future is how robust is the collaboration system in the face of greater political restrictions?
Project description:The present study aimed to explore the modulation of frequency bands (alpha, beta, theta) underlying the positive facial expressions classification advantage within different post-stimulus time intervals (100-200 ms, 200-300 ms, 300-400 ms). For this purpose, we recorded electroencephalogram (EEG) activity during an emotion discrimination task for happy, sad and neutral faces. The correlation between the non-phase-locked power of frequency bands and reaction times (RTs) was assessed. The results revealed that beta played a major role in positive classification advantage (PCA) within the 100-200 and 300-400 ms intervals, whereas theta was important within the 200-300 ms interval. We propose that the beta band modulated the neutral and emotional face classification process, and that the theta band modulated for happy and sad face classification.
Project description:To foster a deeper understanding of the mechanisms behind inequality in society, it is crucial to work with well-defined concepts associated with such mechanisms. The aim of this paper is to define cumulative (dis)advantage and the Matthew effect. We argue that cumulative (dis)advantage is an intra-individual micro-level phenomenon, that the Matthew effect is an inter-individual macro-level phenomenon and that an appropriate measure of the Matthew effect focuses on the mechanism or dynamic process that generates inequality. The Matthew mechanism is, therefore, a better name for the phenomenon, where we provide a novel measure of the mechanism, including a proof-of-principle analysis using disposable personal income data. Finally, because socio-economic theory should be able to explain cumulative (dis)advantage and the Matthew mechanism when they are detected in data, we discuss the types of models that may explain the phenomena. We argue that interactions-based models in the literature traditions of analytical sociology and statistical mechanics serve this purpose.