biotorrents / biophp
BioPHP implements some light tools for manipulating genomic data
dev-master
2021-07-23 19:02 UTC
Requires
- php: >=7.4
- ext-blake3: *
- ext-curl: *
- pear/math_biginteger: ^1.0.3
Requires (Dev)
- d11wtq/boris: ^1.0.10
This package is auto-updated.
Last update: 2024-10-24 02:38:51 UTC
README
BioPHP implements a selection of simple tools for manipulating genomic data. It aims to build tools for basic RNA, DNA, and protein manipulation. BioTorrents.de's fork is designed for biotorrents/gazelle and is heavily inspired by TimothyStiles/poly.
Simple Usage (to be revised)
Find Reverse Complement
$BioPHP = new BioPHP(); $result = $BioPHP->reverse('ATGAAAGCATC'); $result = $BioPHP->complement($result); //prints TTTCAT
Calculate GC Content
$BioPHP = new BioPHP(); echo $BioPHP->gcContent('ATGAAAGCATC', 4)."\n"; //prints 36.3636
Count Point Mutations Between Two Sequences
$BioPHP = new BioPHP(); echo $BioPHP->countPointMutations('CTGATGATGGGAGGAAATTTCA','CTGATGATGCGAGGGAATATCG')."\n"; //prints 4
Translate DNA Sequence to Amino Acid Sequence
$BioPHP = new BioPHP(); echo $BioPHP->translateDna('CTGATGATGGGAGGAAATTTCAGA')."\n"; //prints LMMGGNFR
Calculate Monoisotopic Mass
$BioPHP = new BioPHP(); $proteinSequence = $BioPHP->translateDna('CTGATGATGGGAGGAAATTTCAGA')."\n"; echo $BioPHP->calcMonoIsotopicMass($proteinSequence)."\n\n"; //prints 906.42041
Finding a Motif in DNA
$BioPHP = new BioPHP(); echo $BioPHP->findMotifDNA('ATAT', 'GTATATCTATATGGCCATAT')."\n"; //prints 3 9 17
Get Reading Frames
$BioPHP = new BioPHP(); print_r( $BioPHP->getReadingFrames('GTATATCTATATGGCCATAT') ); /* * returns array containing... Array ( [0] => GTATATCTATATGGCCATAT [1] => TATATCTATATGGCCATAT [2] => ATATCTATATGGCCATAT ) */ //Protip: To get all 6 reading frames. Use the reverse and complement methods, then pass the result to getReadingFrames()
Find most common likely ancestor
$fastaSequence = " >Sequence 1 ATCCAGCT >Sequence 2 GGGCAACT >Sequence 3 ATGGATCT "; $BioPHP = new BioPHP(); $fastaArray = $BioPHP->readFasta($fastaSequence); //read and parse the sequences echo $BioPHP->mostLikelyCommonAncestor($fastaArray)."\n"; //prints ATGCAACT
Get a fasta result from Uniprot and calculate isotpoic mass
$BioPHP = new BioPHP(); $uniprotFasta = $BioPHP->getUniprotFastaByID("B5ZC00"); //returns the result from Uniprot as a string $fastaArray = $BioPHP->readFasta($uniprotFasta); //parses the response echo $BioPHP->calcMonoIsotopicMass($fastaArray[0]['sequence'])."\n"; //prints 55319.0636
Find protein motif using a variable "shorthand" motif search
$BioPHP = new BioPHP(); $results = $BioPHP->findMotifProtein("N{P}[ST]{P}","B5ZC00"); print_r($results); /* * returns array containing... Array ( [0] => 85 [1] => 118 [2] => 142 [3] => 306 [4] => 395 ) */ //Notes: The second parameter expects a protein access ID string used to lookup the full sequence via UniProt.
Finding a shared motif
This task can be very CPU intensive. Using PHP 7, this method benchmarked faster than Python! Runtime results were about 1 second with a collection of 100 DNA strings of length 1 kbp each.
$fasta=" >Sequence 1 GATTACA >Sequence 2 TAGACCA >Sequence 3 ATACA"; $BioPHP = new BioPHP(); $fastaArray = $BioPHP->readFasta($fasta); $result = $BioPHP->findLongestSharedMotif($fastaArray); echo $result."\n"; //prints TA
Find open reading frames from DNA sequnce
$Sequence = ">Test DNA Sequence TCCCCGGACTCCAAACGCTCGGTAGCCGCCCCTGCTCGACATATTTAGCTCCCTGCATTG ACGCCCTGGCAGCCCCGATCAATTTTCGTGGTTAAACGCGCGCTCGCAAGGGACATCGAC CGGACCACAGAGCATAGCATGCCTTAGGATCGCCTGTCACTGTTCGTCTCCCTATTTGAG CACTGTAGCCCCTGGTACCCCCGTCCTGAAGCGTGTGTGATACACGGTCTGCCCAAGATG "; $BioPHP = new BioPHP(); $results = $BioPHP->printORFProteins($Sequence); print_r($results); /* * Returns the following array Array ( [0] => MP [1] => MLCGPVDVPCERAFNHEN [2] => MLCSVVRSMSLASARLTTKIDRGCQGVNAGS [3] => MSLASARLTTKIDRGCQGVNAGS ) */
Locating restriction sites between length of 4 and 12
$BioPHP = new BioPHP(); $results = $BioPHP->findRestrictionSites("TCAATGCATGCGGGTCTATATGCAT", 4, 12); //returns an array containing postion and length of restrictions
Inferring mRNA from Protein - calculates total different RNA strings from which the protein can be translated
Note: This method requires the use of the PHP Math Big Integer package which is a composer dependency of this project.
$BioPHP = new BioPHP(); $result = $BioPHP->inferringMRnaFromProteinCount("MTIFMFHNKNICTEYMGYYDQQIMQTEHKWYWDFHTFMIPNVFYEDVIKFKMRMLMIPNCFFGPWLFCKLEKCQYYEKATEPAPIVKDYTLFATGGAGREATFWPWFWTDENRPKDYYFQRDGLHHRNEPRLPHATCRRAYYQCEMIQYAIVTSCVLLAWKMFTDYGHTGVASEPKEPQEDIKCMKFPHMSWQKTLTEAFYELFPCYPEEFPNDRPWLLGHGFGPIVCTITAIDTTDVAKNIWKAVFRPHAGNWDIGFHSPCASEGCPDIMFPYFTCHDYKGMMCCFNLTMEVCCKQPRPTGIYMMVERMRIMNNREFAGFKHYREEHIKHYWRFGIFASPFVICWSPKTKGPPTSDWYMRDSEVVTQESELKESWQDMMEQHSMFGIPHCEKERWMNDNWKCKLFYYEVILWISNCECDQHVNCCVAHDPGTQVDWAWTLDMWWDQKYFGFFVRKKGQKYNMHWGAPYWLTNPTEKKDFIQHEQLGPLQTFRHCSSPAPT"); echo $result."\n"; //prints 884608