Page Actions

Fluxus

From ISOGG Wiki

Fluxus is, like Splitstree, a free phylogenetic network program which can be used to generate evolutionary trees (cladograms) and networks from genetic, linguistic, and other data. The program can provide age estimates for any ancestor in the tree.

Its user's guide states: "Network is used to reconstruct phylogenetic networks and trees, infer ancestral types and potential types, evolutionary branchings and variants, and to estimate datings. The algorithms are designed for non-recombining bio-molecules. Successful applications include mtDNA, Y-STR, amino acid, RNA, virus DNA, bacterium DNA, some effectively non-recombining autosomal DNA, and non-biomolecule data such as linguistic data. By contrast, recombining bio-molecules will deliver high-dimensional networks which will be difficult to interpret."

Benefits

The primary benefit of Fluxus to genetic genealogists is that it permits visual assessment of genetic relationships, as inferred by Y-STR data. It reduces many numbers (allele values by marker) to a few (presumed) key differences and displays them pictorially in a network diagram or cladogram.

The value of such a diagram may vary in each instance. It may or may not reveal unknown relationships; it may or may not confirm previous conclusions.

Fluxus (or similar software) is one of many analytic techniques to be employed in genetic genealogy.

Using Fluxus

Fluxus provides many optional adjustments. Their uses may be obvious to the expert cladisticist, but not to many DNA project administrators. Experimentation is recommended.

Fluxus input and output files -- except graphics -- are in plain-text format. No special program is needed to read them; use Notepad or any any plain-text editor (though some may not seem to make sense)

Step 1, Data input

Input data must be in the format the program expects. It may be entered manually or imported from rdf, FASTA, NEXUS or Phylip.

Charlie Warthen and Wes Erickson have provided some free software to prepare files for importing into Fluxus which can be downloaded from www.crwarthen.com/downloads.

Also, McGee Utilities may be used to convert Y-STR data into Fluxus-compatible format. Copy the output from McGee into a plain-text file (e.g., Notepad) and save it with a .ych extension. Then, go to step 2.

The .ych input file consists of marker names on the top line, separated by commas. This is followed by two blank lines, then data for the individual taxa, of which the first is the calculated modal. Each taxon's data consists of an identification line and a line of marker allele values, comma-separated.

The .rdf file eliminates invariable data and looks similar to this, with slashes (/) representing line breaks:

;1.0 / D458aa;D449aa; / 10;10; / >modal;1; / 11 / >16678;1; / 11 / >56040;1; /

11 / >13707;1; / 10 / >94522;1; / 01 /

Step 2, Calculate Network

The user will be offered multiple options including "reduced median" or "median joining". {For STR data, choose median joining, MJ.} Then, open the data file, either a .rdf or .ych file. Clicking the Calculate Network tab will produce a window with calculation results and a dialog box to save an output file with the extension .out. Doing so yields a message that one can go to Draw Network. (A .rdf file will be made during calculations.)

Step 3, Draw Network

Click the Draw Network tab, then File and Open. Change file type from .sto to .out and select your file. A series of dialogue boxes will appear; click Yes and Continue, then Finalize.

After Finalize, a cladogram will be displayed. It is recommended to save the initial image in the .fdi format.

  • {Comment: This graphic is usually very rough; modifying it for visual acceptability is Step 4.)

At this time, you may also view statistics for the network -- number of taxa, number of mutations, list of mutations and the maximum instance of each, etc. This meta-data can be exported to a .sta file.

Initial Drawing

Unmodified Fluxus drawing

Median Vector

Some diagrams will contain one or more nodes labeled, "mv1", "mv2", etc. These are median vectors, inserted by the software as "hypothesised (often ancestral) sequence{s} .. required to connect existing sequences within the network with maximum parsimony. Without the median vector, there would be no shortest connection between the data set's sequences." (User's Guide, p. 17)

A median vector (connecting three or more nodes) may also represent alternate pathways between nodes.

Step 4, Modify the Network Drawing

This user has observed a number of problems with initial Fluxus diagrams and felt the need to modify them extensively before use in presentations. Problems include

  • Graphic is too large, usually in horizontal direction;
  • With horizontal orientation of arms, mutation labels appear on top of arm lines;
  • Small fonts are illegible;
  • Exact matches occupy the same node; it's not obvious that the node contains more than one person.

One may alter the position of nodes to compact the graphic an make mutation labels readable; a simple click and drag works. When doing so, try to maintain relative lengths of arms; an arm with three mutations should not be shorter than one with one mutation.

One may increase font size, either all at once in the main options box or individually for taxa and mutation labels. Hint: right-click a label to bring up its adjustment possibilities; then, if desired, select "apply to all".

Double-click on a node to bring up the information about the taxa included. For exact matches, right-click the node to bring up its options dialog box. This user's goal is to have as many pie slices as taxa, each with a different color. Click "Add a slice", then reduce the number of slices on the top line until you have as many lines as taxa and each contains only one taxon. Go through them, changing colors as needed.

Modified Drawing

Modified Fluxus drawing

It's recommended that you save the .fdi file as you modify; in case of accident, this avoids starting from scratch. When finished, save as a bitmap (.bmp) file. For further modifications, the .bmp can be imported into graphics editing software.

Hint: The drawing typically includes much blank (wasted) space, adding to file size. Graphics-editing software will allow cropping to only the used portion.

Step 5, Calculate Time

Click Time Estimate on the main Fluxus menu and open the .fdi file. You'll be asked to specify the ancestral node (modal?), then descendant nodes. Finally click Calculate Time to get the results.

{Comment: Some of these estimate may not seem credible. The frequency in years per mutation may be adjusted.)

Fluxus Criticisms

The software is not (to put it mildly) universally loved. Some stated criticisms include:

  • The tree is unrooted. The nearest equivalent to a root of a Fluxus diagram is the modal node (rearraing of nodes may place it at the center); the modal is assumed by the parsimony criterion to represent the root; some diagrams may also display a "torso" or ring to represent those closest to the modal. The lack of a root may be an effect of using data from presently-living test subjects to infer a no-longer-living ancestor's data.
  • The software is poorly documented. Fluxus Network has a separately downloadable users' manual at http://www.fluxus-engineering.com/Network4611_user_guide.pdf. However, it lacks a interactive help file.
  • The underlying assumptions and processing methods are not stated. {Comment other than "RTFM" needed.}

Notes

Limits

Capability seems robust for most genetic genealogy applications; our data sets may fall into a a size category the publishers describe as "trivial". the software's limits are maximums of

  • 3000 data lines (taxa or participants), {30,000 lines for some data types}
  • 1000 data columns (loci, SNPs, nucleotide positions) and
  • 999 "Frequencies" (i.e., same taxa repeated).
  • Sequence identification labels are limited to six (6) alphanumeric characters; longer labels are truncated.

Should your application push these limits, either of the following could reduce the size of the data set:

  • Delete invariable data columns;
  • Reduce identical sequences to one sequence and increase "frequency" to suit.

Should processing appear to take too long, it's possible to "kill" it by pressing Control-Alt-Delete. The users' giude contains suggestions to reduce processing time.

Terminology

Some terms used in special senses:

  • "Node" means a position on the network, of which there are two types: taxon (input data) and median vector (calculated to construct shortest path between nodes).
  • "Link" means lines connecting nodes and indicating paths between them.
  • "Character" means a property, for example, allele value of a STR marker, positive or negative for a SNP or the presence of C or T at a nucleotide position.
  • "Sequence" means a string of "characters" representing a taxon, group of taxa or test subjects. For example, a "sequence" could represent the marker/allele values for a participant.
  • "Frequency" means the number of individuals (taxa) for that sequence. If greater than one, it means that number of individuals share the identical "sequence".
  • "Invariable data" means data which is the same for all members of the data set and is ignored by Fluxus. For example, if all have DYS393=13, this marker has no effect on the network and does not affect paths; it will be deleted during "Calculate Network".
  • "Variable data" means data for which there is change within the set; variable data affects the tree. For example if some have DYS393=13 and at least one has DYS393=14, DYS393 is variable data; it will be retained and help to draw the network.

Time Estimates

The time estimates feature seems not to be well-adapted to the genealogical time frame.

See also

External links