Wednesday, March 30, 2016

LeetCode Q187: Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Solution 1:
Using unordered_map to save substring.

class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
unordered_map<string, int> myMap;
vector<string> res;
if(s.length()<=10)
return res;
for(int i=0; i<=s.length()-10; i++){
string substring = s.substr(i, 10);
if(myMap[substring]==0){
myMap[substring]=1;
}else{
if(myMap[substring]==1){
res.push_back(substring);
myMap[substring]++;
}
else{
if(myMap[substring]>1)
continue;
}
}
}
return res;
}
};
view raw Q187-1.cpp hosted with ❤ by GitHub

Solution 2:
Converting substring to integer and hash integer instead of substring. Need to use different code for 'A', 'C', 'G', 'T'.

class Solution {
public:
int str2int(string s){
int str=0;
for(int i=0; i<s.length(); i++){
if(s[i]=='A')
str = str<<2 | 0;
if(s[i]=='C')
str = str<<2 | 1;
if(s[i]=='G')
str = str<<2 | 2;
if(s[i]=='T')
str = str<<2 | 3;
}
return str;
}
vector<string> findRepeatedDnaSequences(string s) {
unordered_map<int, int> myMap;
vector<string> res;
for(int i=0; i+10<=s.size(); i++){
if(myMap[str2int(s.substr(i, 10))]++==1)
res.push_back(s.substr(i, 10));
}
return res;
}
};
view raw Q187-2.cpp hosted with ❤ by GitHub

No comments:

Post a Comment