Page Actions

Talk

STR Match Finder

From ISOGG Wiki

Ideas (Admins) STR Match Finder

Talk area with Ideas how to extend STR Match Finder (SMF) to a toolset as replacement/addition to GAP by testing it on the J2 and subproject situation. Setup by --ChrisR (talk) 16:37, 18 April 2019 (EDT).

Because of Privacy restrictions on data hosting most Functions should be available only to logged-in Admins/Users, some could be available both with/without login.

Main project data

Probably best way would be to include all needed data (in CSV format?) from GAP > Project Admin > Download Files; possibly with an semi-automatic update:

Primary data

  • Y-DNA Results Classic: project group name (per group), kit number, name, paternal ancestor, country, Hg, DYS-Values;
  • Paternal Ancestry: project group name (per kit, mouseover Hg?), comment, location/coordinates (mouseover country with link to map?),

Useful data

  • Received Lab Results: to quickly see if a kit has BigY results (output behind Hg?)
  • Pending Lab Results: to quickly see if BigY, SNP-Pack or STR upgrade is pending (output behind Hg with mouse-over for details?)
  • Y-DNA SNP: to quickly check for a certain SNP result (Hg mouse-over?)

Additional data

  • Member Information: email-address per kit (mouse-over to kit-ID with "name <email@adress>"?)
  • Member Notes: by the Admins (mouse-over to kit-ID?) - not used in J2-M172 so far

Secondary project data

Could include the same type of data which is not the same as in the main project, if a kit is present also in the main project, all data beside the data project group name from the secondary project should be dropped. Maybe accumulation of multiple project data and visualization as a mouse-over?

Inclusion of more external sources

STR-Values and kit/country data from other sources (public projects, study papers, old offline sources, maybe the YSTRSearcher.2016-11 database. Again if a kit number is in main and/or secondary data, the additional data should be dropped. Using of Copy and Paste from public projects (saved in a TXT file?) including All non-STR-Data (ID, Ancestor-Origin, Country, Hg, see MODIFIED Y-Utility) would be optimal.

Especially for Geographical projects as well as Surname projects a Database of "reference Haplotypes for every Haplogroup" (maybe Y67+ with BigY or Deep Clade results) would be very useful. This would boost the Tool to a combined Haplogroup Predictor and STR Match Finder. Not sure tough how that reference database can be best collected and kept updated.

Data management

  • Maybe switching of main and secondary project should be easy as to have multiple "SMF Admin instances" based on the same data: for example J2 (main) with M241 (secondary) and viceversa.
  • if multiple Admins/Projects participate the constantly updated data could be also conveniently used for PhyloGeographer functions by Hunter, tough privacy would limit public output to maybe country or region etc.

Query functions

The following functionality could be useful:

  • GD compare: more options to maybe allow Y12, Y23, Y25, Y43, "STR number/values of Query sample" and other comparisons with low-resolution or study-lab-results. Maybe check most interesting papers for J2 samples from otherwise poorly samples areas.
  • GD cutoff: maybe when changing GD compare this should change to a useful value like 20% of GD compare?

Question: could Genetic Distance calculated like by FTDNA in Y-DNA Matches be added? See Hybrid GD calculations used by FTDNA since 2016-07