|
In its "Prefix sum" section, Codility proposes the following problem. I don't see how to solve it within the constraints, that are O(N+M), and O(N) for space complexity. I know I should use the prefix sum technique, but I would need more hints. I'd like to not be given the solutions, but hints that would help me guide my thought process.
--
A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively. You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence?
The DNA sequence is given as a non-empty string S = S[0]S[1]...S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and Q[K] (inclusive).
For example, consider string S = CAGCCTA and arrays P, Q such that:
P[0] = 2 Q[0] = 4
P[1] = 5 Q[1] = 5
P[2] = 0 Q[2] = 6
The answers to these M = 3 queries are as follows:
The part of the DNA between positions 2 and 4 contains nucleotides G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2.
The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4.
The part between positions 0 and 6 (the whole string) contains all nucleotides, in particular nucleotide A whose impact factor is 1, so the answer is 1.
Write a function:
def solution(S, P, Q)
that, given a non-empty zero-indexed string S consisting of N characters and two non-empty zero-indexed arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.
The sequence should be returned as:
a Results structure (in C), or
a vector of integers (in C++), or
a Results record (in Pascal), or
an array of integers (in any other programming language).
For example, given the string S = CAGCCTA and arrays P, Q such that:
P[0] = 2 Q[0] = 4
P[1] = 5 Q[1] = 5
P[2] = 0 Q[2] = 6
the function should return the values [2, 4, 1], as explained above.
Assume that:
N is an integer within the range [1..100,000];
M is an integer within the range [1..50,000];
each element of arrays P, Q is an integer within the range [0..N − 1];
P[K] ≤ Q[K], where 0 ≤ K < M;
string S consists only of upper-case English letters A, C, G, T.
Complexity:
expected worst-case time complexity is O(N+M);
expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.
modified 6-Nov-17 5:26am.
|
|
|
|
|
S is a string; or an array of characters; characters which "convert" to impact numbers.
P and Q are (3) "substring ranges" into S.
The subset / substring of S will contain one or more "MIN" impact numbers.
"(I) am amazed to see myself here rather than there ... now rather than then".
― Blaise Pascal
|
|
|
|
|
Wow.
There are so many things I love about this question. I'm going to save it. And I'm going to try if not to solve it, then to demonstrate why I can't solve it, using code that consumes some finite computation space or time.
I'm mucking about with a self-modifying linear bounded automata interpreter/compiler i've built and i am crap at most math, but just good enough at the right kinds of math to know how to build this problem and it express it through that.
And you've asked what may be exactly the question I need.
How many times can someone say that?
Thank you =)
|
|
|
|
|
hi,
im pretty rusty with algorithms and i was posed this question. how do I find the most amount of pages i can buy from a list of books within a certain budget. I got as far as sorting the list based on cost, then by no. pages. then i was thinking of creating a map with the cost as key and a list of books at each price. then sorting each keys list based on no of pages. I think im headed in the right direction but iam not quite sure how to complete the exercise? can someone help me of point me in the direction of the correct algorithms i should be using? cheers!
|
|
|
|
|
This is homework, and we don't do that for you - it's an exercise to get you thinking about problems, and working out how to solve them. If we give you the "correct algorithm" then that defeats the point of giving you the homework!
So instead, sit down with a pencil and paper, and work out how you would do it for yourself manually.
Hint: would knowing the price per page help you come up with a solution?
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
its not homework. its something im trying to implement in an android app im building! its an algorithm for an in built calculator to check how many pages i can buy from a list of comics from the marvel api.
|
|
|
|
|
You at first have to think about what is important for you :
- a special order
- lowest rest money
- maximum count of books
Dependent on this selection you would build different List's.
To be able to decide which List could be the best you must build different Sub-Lists - example :
- you have 100 € and you could buy Book 1 (40 €), Book 2 (35 €), Book 3 (45 €), Book 4 (20 €)
- now one List could be Book 1,2,4 - another one could be Book 2,3,4 - or Book 1,3 ...
Now ... which List contains the best results for you ?
|
|
|
|
|
its based on most pages in a book that i can buy for a certain budget. think its the 0/1 knapsack problem.
|
|
|
|
|
Hi, I have 4 arrays 1, 2, .. Ni (i=1-4). Ni are fixed and very big. Every time I need to sample without replacement from each of them. The sample sizes (say M) are same for the 4 samples and are fixed and big as well (so Ni is bigger than M). I then calculate a single number (say X) using these 4 samples, which takes long time (normally 5-8 hrs). I need to find a 4-sample set which minimizes X. What is the best search strategy? Thanks in advance.
modified 5-Nov-17 6:01am.
|
|
|
|
|
Hi folks, I've tried Googling for an answer to this but keep getting directed down network load balancing which, although related, is not my scenario.
My scenario: I have 3 products (x,y and z) which take 2000, 4000 and 5000 labour hours to assemble respectively. If I want to produce say 50 of product x, 40 of product y and 60 of product z in a year is there an algorithm for distributing the production (the loading) so that the labour hours are spread as evenly as possible throughout the year? I'm quite happy to be offered some additional reading / guidance or a general solution (teach a man to fish etc.).
That's always assuming there is an answer of course .
Andy B
modified 1-Nov-17 11:21am.
|
|
|
|
|
I am not sure whether this is what you want, but I really think you need to learn some cost/lost functions to decide how to manage the resources to minimize the cost and maximize the outputs. If I had paid any more attention to the class, I would have learnt how the maximization or cost reduction algorithms work, but... Cost function - Wikipedia.
Following resources will help you in understanding the overall algorithm, then write your own,
Cost Function - Coursera.org
Mathematical optimization - Wikipedia
The sh*t I complain about
It's like there ain't a cloud in the sky and it's raining out - Eminem
~! Firewall !~
|
|
|
|
|
Thanks Afzaal, the cost function is certainly what I'd like to end up with but when I look into the articles it quickly gets to a level of mathematical notation that I'm not comfortable with, despite being an engineering graduate (long time ago though).
I think in this case I'm going to have to approximate something by working out each products weekly average, then phase shifting them by a number of days each and see if I can derive a function from this somehow. It won't be optimal but then real-world production usually isn't .
The links reference some fascinating material though, it's certainly an area I'd like to explore in more mathematical detail at some point.
Andy B
|
|
|
|
|
Or on the other hand you can consider looking for a library that lets you do this, I do not know of any but a quick Google search will yield a few results easily.
The sh*t I complain about
It's like there ain't a cloud in the sky and it's raining out - Eminem
~! Firewall !~
|
|
|
|
|
|
Thanks Gerry but in this instance I'll be programming in a non-Microsoft environment, so making API calls will not be possible, or at least not easily. I have used the Excel solver function in the past for optimization exercises and it is very effective.
|
|
|
|
|
Don't get me wrong. I believe Codility proposes interesting problems. This tool might even be a good way of testing candidates' algorithmic skills (not programming skills). My frustration is elsewhere: it's hard, even when it pretends otherwise.
Problems can be one of three categories: "painless", "respectable", and "ambitious".
My experience however is that few of the painless ones actually are painless. Most of them will require a respectable amount of time to solve. A lot of the problems conceal some trick that you need to somehow discover, and you cannot get any hints. Solutions on the internet will instantaneously spoil the whole exercise instead of giving you hints progressively.
Now, even though you got the algorithm right. You are far from being done, as all you have so far isn't worth a penny. The implementation might actually take you more time than finding the algorithm itself. Because, indexes. You misplaced one incrementation, you forgot one, or whatever. The smallest mistake will quickly drop your score down to 0%, you failed, goodbye. You had the right idea though, it's a shame.
And the worst part is: Google won't find anybody that had or is having a similar experience. I'm alone, and it doesn't feel good.
|
|
|
|
|
One wrong "increment" can destroy a $$$ industrial process; or kill a patient.
Better on paper, than in the real world.
"(I) am amazed to see myself here rather than there ... now rather than then".
― Blaise Pascal
|
|
|
|
|
.. or using the wrong units can prang some very expensive hardware a long long way from home.
Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012
|
|
|
|
|
Of course, but that's why you have more than a couple of hours to think about things, and you have discussions with your peers, and peer reviewing, ...
But that wasn't exactly my point.
|
|
|
|
|
Background:
I developed a 2-d simulation that allows for the training of surgical technologists in the setup of surgical trays. The student is able to pick a tool from a toolbox and drag it onto the surface of the tray and orient it the way it needs to be for the surgery. The grading used a nearest neighbor algorithm using data from a tray set up in the graphical editor by an instructor or subject matter expert. I have available the coordinates of each image as well as the height and width and the orientation of the image. It works okay for that because we had a set of rules guiding the distance the tool can be away from a right answer and still be right.
Problem:
Now, I have to do a similar application for another purpose. The catch is that the grading is more "fuzzy". They don't want grading by the exact location of the user's answer as compared to the right answer. They just want the objects graded based on the approximate location of the object with regard to the other objects on the tray and they should be in the correct order. I have tried to figure this out but have come up short. I can always find that case that would throw the whole thing off. I was wondering if anyone here has any ideas?
Thanks in advance,
Joe
|
|
|
|
|
You said: "based on order".
There needs to be an "ideal order". Once that's established, you can determine what consitutes a "variance" from this "ideal"; how it's measured; etc.
"(I) am amazed to see myself here rather than there ... now rather than then".
― Blaise Pascal
|
|
|
|
|
The definition of a "right" answer seems to be reasonably simple, are they in the right order and do the come within a minimum distance of another instrument.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
In the following problem, which algorithm should be considered?
Having a finite set of variable height brikcs, how can I find the subset that can reach at least a given height, minimizing the total height of the chosen bricks?
In other words, which bricks should I use to reach a given height minimizing cost of material?
Thanks
|
|
|
|
|
|
Yes, I thought about that algorithm for my purpose.
But the two targets are different: in the knapsack problem you must maximize a value minimizing a cost; the value and the cost are two different quantities.
In this case the value and the cost are very related and you have to minimize the value constrained it is more than a given quantity.
Maybe I can apply the same algorithm, but I cannot see how to reduce my problem to the knapsack's one.
|
|
|
|
|