Software and Data
Some tools that I build and contribute to that you may also find useful.
Software
autoslider. [R] The normal process of creating clinical study slides is that a statistician manually type in the numbers from outputs and a separate statistician to double check the typed in numbers. This process is time consuming, resource intensive, and error prone. Automatic slide generation is a solution to address these issues. It reduces the amount of work and the required time when creating slides, and reduces the risk of errors from manually typing or copying numbers from the output to slides. It also helps users to avoid unnecessary stress when creating large amounts of slide decks in a short time window.
formatters. [R] We provide a framework for rendering complex tables to ASCII, and a set of formatters for transforming values or sets of values into ASCII-ready display strings.
rtables. [R] Reporting tables often have structure that goes beyond simple rectangular data. The ‘rtables’ package provides a framework for declaring complex multi-level tabulations and then applying them to data. This framework models both tabulation and the resulting tables as hierarchical, tree-like objects which support sibling sub-tables, arbitrary splitting or grouping of data in row and column dimensions, cells containing multiple values, and the concept of contextual summary computations. A convenient pipe-able interface is provided for declaring table layouts and the corresponding computations, and then applying them to data.
rtables。officer. [R] Designed to create and display complex tables with R, the ‘rtables’ R package allows cells in an ‘rtables’ object to contain any high-dimensional data structure, which can then be displayed with cell-specific formatting instructions. Additionally, the ‘rtables.officer’ package supports export formats related to the Microsoft Office software suite, including Microsoft Word (‘docx’) and Microsoft PowerPoint (‘pptx’).
rlistings. [R] Listings are often part of the submission of clinical trial data in regulatory settings. We provide a framework for the specific formatting features often used when displaying large datasets in that context.
tern. [R] Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.
teal.modules.clinical. [R] This package contains a set of standard teal modules to be used with CDISC data in order to generate many of the standard outputs used in clinical trials.
DEploid. [c++, R] To deconvolve mixed Plasmodium falciparum sequence data, and report the mixture proportions. An R version is available at CRAN.
Hybrid-lambda. [c++] A simulation tool for lambda coalescent of a given species network. The population structure (species tree/network) is expressed as a Newick string.
Hybrid-coal. [c++] To compute gene tree probabilities given a species network under the coalescent process using dynamic programming.
smcsmc. [c++] This software package infers demographic events and rates from multiple-sample whole-genome sequence data.
scrm. [c++] A kingman coalescent simulator suitable for large- scale whole-genome sequences simulations.
Chamaeleo. [python] Chamaeleo is currently the only collection focused on different codec methods for DNA storage. This kit is mainly developed and operated by BGI-Research (Shenzhen). It provides you a chance to use the classical DNA encoding and decoding methods to save files into DNA sequences or load them from DNA sequences.
I have also contributed to the code base of the following programs:
GA4GH-server. [python], Reference implementation of the APIs defined in ga4gh-schemas.
msprime. [c, python] A reimplementation of Hudson’s classical ms simulator for modern data sets.
TreeWAS. [R] Use tree-structured healthcare data to perform genetic association studies using the UK Biobank data.
Data
Pf3k Deconvoluted Plasmodium falciparum haplotypes.