Project description:Given trained models from multiple source domains, how can we predict the labels of unlabeled data in a target domain? Unsupervised multi-source domain adaptation (UMDA) aims for predicting the labels of unlabeled target data by transferring the knowledge of multiple source domains. UMDA is a crucial problem in many real-world scenarios where no labeled target data are available. Previous approaches in UMDA assume that data are observable over all domains. However, source data are not easily accessible due to privacy or confidentiality issues in a lot of practical scenarios, although classifiers learned in source domains are readily available. In this work, we target data-free UMDA where source data are not observable at all, a novel problem that has not been studied before despite being very realistic and crucial. To solve data-free UMDA, we propose DEMS (Data-free Exploitation of Multiple Sources), a novel architecture that adapts target data to source domains without exploiting any source data, and estimates the target labels by exploiting pre-trained source classifiers. Extensive experiments for data-free UMDA on real-world datasets show that DEMS provides the state-of-the-art accuracy which is up to 27.5% point higher than that of the best baseline.
Project description:The construction of data center network by applying SDN technology has become a hot research topic. The SDN architecture has innovatively separated the control plane from the data plane which makes the network more software-oriented and agile. Moreover, it provides virtual multi-tenancy, effective scheduling resources and centralized control strategies to meet the demand for cloud computing data center. However, the explosion of network information is facing severe challenges for SDN controller. The flow storage and lookup mechanisms based on TCAM device have led to the restriction of scalability, high cost and energy consumption. In view of this, a source-controlled data center network (SCDCN) model is proposed herein. The SCDCN model applies a new type of source routing address named the vector address (VA) as the packet-switching label. The VA completely defines the communication path and the data forwarding process can be finished solely relying on VA. There are four advantages in the SCDCN architecture. 1) The model adopts hierarchical multi-controllers and abstracts large-scale data center network into some small network domains that has solved the restriction for the processing ability of single controller and reduced the computational complexity. 2) Vector switches (VS) developed in the core network no longer apply TCAM for table storage and lookup that has significantly cut down the cost and complexity for switches. Meanwhile, the problem of scalability can be solved effectively. 3) The SCDCN model simplifies the establishment process for new flows and there is no need to download flow tables to VS. The amount of control signaling consumed when establishing new flows can be significantly decreased. 4) We design the VS on the NetFPGA platform. The statistical results show that the hardware resource consumption in a VS is about 27% of that in an OFS.
Project description:BackgroundAs highlighted by the COVID-19 pandemic, researchers are eager to make use of a wide variety of data sources, both government-sponsored and alternative, to characterize the epidemiology of infectious diseases. To date, few studies have investigated the strengths and limitations of sources currently being used for such research. These are critical for policy makers to understand when interpreting study findings.MethodsTo fill this gap in the literature, we compared infectious disease reporting for three diseases (measles, mumps, and varicella) across four different data sources: Optum (health insurance billing claims data), HealthMap (online news surveillance data), Morbidity and Mortality Weekly Reports (official government reports), and National Notifiable Disease Surveillance System (government case surveillance data). We reported the yearly number of national- and state-level disease-specific case counts and disease clusters according to each of our sources during a five-year study period (2013â€"2017).FindingsOur study demonstrated drastic differences in reported infectious disease incidence across data sources. When compared against the other three sources of interest, Optum data showed substantially higher, implausible standardized case counts for all three diseases. Although there was some concordance in identified state-level case counts and disease clusters, all four sources identified variations in state-level reporting.InterpretationResearchers should consider data source limitations when attempting to characterize the epidemiology of infectious diseases. Some data sources, such as billing claims data, may be unsuitable for epidemiological research within the infectious disease context.
Project description:Genome projects and multiomics experiments generate huge volumes of data that must be stored, mined, and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scenario for more than a decade and have been implementing software and databases to meet this challenge. The GMOD's (Generic Model Organism Database) biological relational database schema, known as Chado, is one of the few successful open source initiatives; it is widely adopted and many software packages are able to connect to it. We have been developing an open source software package named Machado, a genomics data integration framework implemented in Python, to enable research groups to both store and visualize genomics data. The framework relies on the Chado database schema and, therefore, should be very intuitive for current developers to adopt it or have it running on top of already existing databases. It has several data-loading tools for genomics and transcriptomics data and also for annotation results from tools such as BLAST, InterproScan, OrthoMCL, and LSTrAP. There is an API to connect to JBrowse, and a web visualization tool is implemented using Django Views and Templates. The Haystack library integrated with the ElasticSearch engine was used to implement a Google-like search, i.e., single auto-complete search box that provides fast results and filters. Machado aims to be a modern object-relational framework that uses the latest Python libraries to produce an effective open source resource for genomics research.