Page Actions

Borland Genetics

From ISOGG Wiki

Borland Genetics is a free DNA software toolkit designed specifically to assist in reconstructing genomes of deceased ancestors and relatives whose DNA is unavailable for testing via traditional means. The software allows users to design and implement custom DNA reconstruction workflows using a series of simple tools that operate on raw DNA resources. The toolkit was publicly released on October 30, 2018.

Compatibility

Borland Genetics tools are compatible with build 37 raw DNA exports from AncestryDNA, 23andMe, MyHeritage and Family Tree DNA (build 37 concatenated). Synthetic output kits are compatible with GEDmatch and/or GEDmatch Genesis. Several tools also allow import of CSV output from DNA Painter to serve as a reconstruction road map.

Definitions/terminology introduced

Borland Genetics categorizes DNA resources as either "mono" or "stereo." A mono kit only contains data from one copy of any chromosome at any given position. Mono kits are not necessarily phased in the traditional sense, as they may contain segments from both the paternal and maternal copies, but rather are phased at the block or segment level. In contrast, a stereo kit contains data comprising of at least one overlapping paternal and maternal block or segment. Factory DNA kits from the major testing companies are stereo kits, as they contain overlapping paternal and maternal blocks and segments throughout.

DNA resources are also categorized as either "full" or "partial" based on reconstruction coverage.of the autosomal and X-chromosomes. A factory DNA kit exhibits 100% coverage and is considered a full stereo resource. Traditionally phased kits (phasing using a parent and child) exhibit 50% coverage and are considered full mono resources. Partially reconstructed resources such as output from the GEDmatch Lazarus tool would be considered partial resources. Partial resources generated by Borland Genetics are not compatible with ethnicity/admixture tools on GEDmatch without further processing using GEDmatch's native Lazarus tool.

Resolution of a DNA resource (or element thereof) is defined as the percentage of reported SNPs not comprised of no call values, and is measurable via the Borland Genetics Test Resolution tool.

The descendants of a target of reconstruction are divided into sibling clades. Each sibling clade consists of genetic descendants of a single child of the reconstruction target. Donors within a sibling clade are referred to as contributors to that clade. Contributors to a reconstruction workflow can be in-clade or cross-clade depending on whether they all are members of a single sibling clade with respect to the target of reconstruction. Reconstructions using solely in-clade donors always result in mono output, where reconstructions that rely upon cross-clade contributors typically result in stereo output. The distinction is crucial for designing workflows using multiple tools.

Template conversion

The Chameleon tool allows users to convert DNA resource between factory templates or onto custom templates. It operates in three modes. In the first mode, users can convert a raw DNA file from its original template to that of another DNA resource. For example, if you provide the tool with an AncestryDNA v1 kit and a 23andMe v5 kit, you can map either kit onto the template of the other. The second mode allows users to map two kits on different templates to a combined template consisting of all of the tested SNPs reported in either template. The third mode allows mapping to a shared template consisting only of SNPs shared by the templates of the two input kits.

Some tools require users to convert all input resources to a common template prior to use.

Basic phasing tools

Among the stand-alone phasing tools packaged in Borland Genetics include the Phoenix tool and its counterpart the Darkside tool, which reconstruct a partial or full phased parent kit using either relatives all on the same side or opposite side of the family as the reconstruction target, respectively. Both tools result in mono output, as they reconstruct portions of a single copy of the child donor's DNA.​

The Missing Parent tool serves as a workflow to reconstruct the DNA of an unavailable parent by phasing the DNA of an available parent with the DNA of as many children of the two-parent couple as resources permit. If more than one child's kit is used as input, the resulting output is stereo. Otherwise, use of this tool with a single child will result in mono output.​

The Two Parent Phasing tool is a simple workflow that phases a donor’s genome using data from both parents to significantly increase resolution of traditional phased output kits.​

Advanced phasing tools

Borland Genetics includes two advanced phasing workflows that interact with output from visual phasing projects stored in DNA Painter profiles. These are the Reverse Phase and Extract Segments tools, both of which are essential component workflows for any complex reconstruction projects to be executed via the toolkit. Additionally, the toolkit provides direct access to unbound data from the tool's phasing engine via the Ultimate Phaser tool.

The Extract Segments tool effectively uses a DNA Painter ancestor group as a filter to be applied to a mono kit to extract segments that are inherited from the corresponding ancestor. The boundaries of the segments can be input or imported into DNAPainter using either Build 36 or Build 37 coordinates, as segment data is automatically converted to Build 37 prior to extraction.​ Output is “bound” because all extraneous data in the input kit is replaced by “white noise” designed to prevent triggering false cousin matches upon upload to GEDmatch.

The Reverse Phase tool accomplishes full paternal vs. maternal phasing of a donor’s chromosomes using DNA from a child rather than a parent.​ Unlike traditional phasing, the process is not fully automatic, and requires the additional step of a simplified visual phasing process using DNA Painter and the GEDmatch Matching Segments tool. Output may or may not include white noise depending on whether there were gaps between segments as painted (the distinction dictates whether or not the output is compatible with GEDmatch admixture tools).

The most frequent application of the Ultimate Phaser tool is to create full mono kits for use in visual phasing projects.​ These visual phasing seed kits should always be marked as research kits when uploaded to GEDmatch, as they do not represent the reconstruction of a single real person.​

Output from the Ultimate Phaser (when applied to a parent and child) can be channeled to one the following three categories of phased data:​

  • Parent ∩ Child (all DNA shared between a parent and child, which will alternate between grandparent streams at recombination points determined via visual phasing)​
  • Parent x Child (sometimes referred to as the evil twin, all DNA not passed from a parent to a child, which will also alternate between grandparent streams at recombination points, but with opposite phase as the previous category)​
  • Child x Parent (representing the DNA inherited from the opposite parent)​

However, the Ultimate Phaser can be applied to any two related individuals, resulting in unbound output. Where the two input kits include half-identical regions (HIR), the result is meaningful phased data. However, in fully identical regions (FIR) or not identical regions (NIR), the tool produces data devoid of genealogical significance, and therefore the kit must be bound prior to contributing to a reconstruction project. The toolkit allows for flexible binding, using the Extract Segments tool as a custom filter allowing user-selected thresholds via GEDmatch segment import.

Tools in development

Certain features and tools in the current version (v1.5) of the toolkit are disabled as they are in various stages of testing and development. Future versions of the software will have these features enabled. The Script Manager (used to develop some of the stand-alone phasing workflows) will allow users to write or record scripts for custom workflows and share them with other users. The tool is currently disabled pending a simplification of the script syntax, which currently requires knowledge of Python.

The Creeper tool allows users to enter a family tree data structure representing the familial relationships between resource donors, and uses lite artificial intelligence to suggest reconstruction workflows based on available resources and family connections. The tool has been designed and partially coded, but is not yet ready for beta testing and AI training.

While the phasing engine of Borland Genetics already imputes some data to its output kits, the Imputer tool allows users to add customized segments of imputed DNA data to fill gaps in partial reconstructions. Used in conjunction with the Creeper tool, the software suggests data corresponding to missing segments in a partial reconstruction based on a user-selected confidence threshold set between 75% and 99%. The required technology has already been developed but requires completion of the Creeper tool prior to full functionality and remains disabled pending such completion.

Developers

Kevin Borland (USA) holds a BS in Physics from the Massachusetts Institute of Technology (MIT) and has been writing scripts for processing raw DNA since 2013. The instant toolkit was designed by Kevin to simplify reconstruction workflows to more efficiently assist clients to whom Kevin provides volunteer genetic genealogy assistance, and to further Kevin's own ancestor reconstruction goals. (Kevin appeared in several episodes of season two of the genealogy-themed reality television show Relative Race and assisted as a consultant prior to the season's filming.)

Leonardo Alminana (Argentina) provided assistance in low-level programming, converting the toolkit into a form that could be distributed across Windows and Mac platforms. The Mac version of the software is to be released shortly pending security certificate issuance.

Steven Borland (USA), an expert in the Python programming language, assisted in the debugging of several key features of the tool.

Borland Genetics' team of 70 volunteer beta testers was vital in transitioning the toolkit from a personal science lab to a commercial-grade software resource suitable for distribution among the genetic genealogy community. Testers who went above and beyond expectations providing significant indispensable input include Jason Porteous (Canada), Rusty Erpenbeck (USA) and Gonçalo Marques (Portugal).

Links

  • Borland Genetics, direct link to a Dropbox folder from where the Windows version of the software can be downloaded, along with an instruction manual and press release.
  • Borland Genetics Users Group, a Facebook group where users of the toolkit interact as a community to assist one another in designing and implementing reconstruction workflows.