Sign In
New User? Register
Statisticians_group
? Already a member? Sign in to Yahoo!

Yahoo! Groups Tips

Did you know...
You can search the group for older messages.

Messages

  Messages Help
Advanced
help to determine the best statistical approach for this data   Message List  
Reply | Forward Message #3692 of 4070 |
Re: [Statisticians_group] help to determine the best statistical approach for this data

Dear […]

Thanks for raising problems which many of us are facing. I want to put some general observation (which may be indirectly associated with your problem) for user of statistics and then discuss your problem specifically. 

People associated with statistics can be divided in two groups- group of developers and users. Challenges for both are different. Developers of statistical methods work on assumption while users work on ground reality. At initial phase of development of statistics this gap was narrow but now it is widening.

As user of statistics, our primary aim should be enrich concerned domain. One can do it in following steps

(1)     Check which type of abstract ideas and believes are prevailing in concerned domain.

(2)     Think how believes and abstract ideas (based on intuition) may be represented through data. Three things are important here (1) What characteristics (like caste, land, welfare) should used on what unit (household, community etc)  (2) How these characteristics should be represented through data (3) What are dependent and independent characteristics  (4) How independent characteristics are related- additively, interactively etc. This is very crucial step. Here it is pertinent to mention that there may be more than one way to represent abstract idea (and characteristics). For example, welfare (a characteristics)  of household may be represented in many ways through data. Similarly there may be different theories (set of independent variables) to explain the production. So basic model comes from expertise of domain. Statistical tool should be used to estimate (an test) the parameter of model so that comparison can be made. Statistician can also help in searching a better model by inclusion of more suitable characteristics or taking different function of characteristics.

(3)     Collect seemingly concerned data according to statistical methods (as far as possible)  

(4)     Use statistical tool to explore, estimate and test parameter of model.

(5)      Revise initial model so that it may be supported through data in better way.

 

It is ground reality that there may be limitation to use various statistical method. What you have to do is to show all your limitation in report. For example you are using secondary data and it is not random. In this case you should mention what are possible source of bias. See attach file “How to lie with statistics.pdf”.

 

Problem of pure statistician is that generally socio economic data is not suitable of advance statistical tool. For applied statistician creativity is in using tools of one domain in others. For example life table method generally used by demographer but it may be used to understand dropout in education. Similarly hazard based model used in medical may be used in economics. Real challenge before pure statistician is to get sufficient expertise in different domain quickly and explore whatever data can say and publish it with its limitation.

Generally without getting domain expertise (or collaboration of domain expert) we want to apply statistical tool in absolute. One of the reason that we are taught by experts (as tool developers not user) who never emphasize role  of context. We are taught in terms of random variable. That is why we think statistical tools may be applied in absolute.

 

Above mentioned steps are nearer to causal model which covers large proportion of human thinking. There may be other type of modeling (like used to explain queues and network). Steps used for such model will be different.   

 

As conclusion, I want to say search method and data as per need of problem in place of searching a problem and methods which is suitable for data. Do not exercise for changing your body to adjust with already created (some time second hand) shirt (data and method). Better to create a shirt which fit on your body. I know this philosophy will not suits to many applied statistician who are under pressure to create more research paper. Applied statistics in socio economic area is long way which starts from case study and participatory (qualitative surveys) to use of data of large scale quantitative survey for which data has been collected by others.

Now I am coming to your questions.

1.       I could not see your graphs, my browser is not opening it (may be due to security reason). Please send it as attachment.

2.       I would like to revise my idea on interval and ratio scale. I could not understand why profit is not ratio scale? Whether ratio of two profits are not meaningful? Please see

http://dogsbody.psych.mun.ca/VassarStats/webtext.html

or

http://faculty.vassar.edu/lowry/ch1pt1.html (if above is not working)

3.       Without condition of normality, you can do a lot of things. For classical regression analysis, condition of normality is not required if you want to estimate parameters of model (with st error). Condition of normality is required if you want to test parameter. Even estimated values provide a lot of information to enrich the knowledge of domain. For testing you can suitably transform your dependent (and independent variable  if needed) (as Mr. Madan said).

4.       I would like to see reference requirement of normality for use of Pearson correlation (I am not posing question)

 

With regards

Nand Kishore
--- On Sun, 8/9/09, Madan Kundu <madan4331@...> wrote:


From: Madan Kundu <madan4331@...>
Subject: Re: [Statisticians_group] help to determine the best statistical approach for this data
To: Statisticians_group@...
Date: Sunday, August 9, 2009, 8:33 AM

 

Your data seems to be following either exponential or lognormal distribution. For this kind of distributions, the best measures of central tendency and dispersion are Geometric mean and Coefficient of variation.
 
Regarding testing of hypotheses, please log-transform your data to make them normal. Then apply appropriate parametric tests.
 
Hope this helps.

Regards
Madan Gopal Kundu


--- On Sun, 9/8/09, bigitop <doctormuniz@ gmail.com> wrote:

From: bigitop <doctormuniz@ gmail.com>
Subject: [Statisticians_ group] help to determine the best statistical approach for this data
To: Statisticians_ group@yahoogroup s.co.in
Date: Sunday, 9 August, 2009, 9:22 AM

 
Hello All.  I have an "Introduction to Statistics" (biostatistics, actually) college course under my belt. All the basics are good to know for general knowledge but when it comes to actually applying it to specific basic problems, I'm afraid it is just not enough. That's why I need your help.

I have data for three variables with strongly skewed distributions.

The first variable "hours" is the number of hours it takes to achieve an outcome X, specifically, the number of hours to close a financial position. This variable is of the type ratio scale as the values cannot go below zero.

The second variable "PPS" is the actual outcome of the transaction: profit or loss. Positive values represent profit, negative values represent losses. This is interval data.

The last variable is "DD" or the drawdown of the transaction. These are all negative numbers with a maximum value of zero, therefore is ratio scale type data.

Below are the graphical summaries generated by MiniTab, and then my questions. Please bear with me!











Ok now the questions.

Regarding the descriptive stats:

1. As far as I know in these cases the best measure of central tendency would be the median. Are there other measures that are better?

2. Since the distributions are not normal, the standard deviation is not a good measure in this case. What is the best way to determine the dispersion aside from quartiles?  can I use the absolute deviation from the median or the mean? how true is this measure? what do you suggest?

Next I would like to infer certain conclusions based on this data. For example: Are transactions that take longer to close more profitable than the ones that take less time to close?, How does the duration of the trade relate to the drawdown?, etc...

1. I'm confused about what should I use: parametric or non-parametric tests? On one hand I have interval and ratio data which tells me I should use parametric tests. On the other hand I have non-normal distributions which tells I better go with non-parametric. .

2. Since the data is not normal, I can't use Pearson correlation. Are there any other tests that would be ok to use for this data. My main objective is to determine the relationships between the different variables.

3. What steps can I take to determine the best statistical model to use for my data? Or should I even worry about this?? or a better question: when should I start worrying about the appropriateness of the model?

I can only think of these issues in black or white because I dont have a deep intuitive understandingof  of the intricacies of statistics, but I've tried to make it as clear as possible. This is my first post to the group.  Sorry for the length of the post. I couldn't explain my concerns in a shorter one. Thank you in advance for your replies and suggestions.




See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz.



1 of 1 File(s)

Sun Aug 9, 2009 1:23 pm

nk_singh1
Offline Offline
Send Email Send Email

Forward
Message #3692 of 4070 |
Expand Messages Author Sort by Date

Hello All. I have an "Introduction to Statistics" (biostatistics, actually) college course under my belt. All the basics are good to know for general...
bigitop
Offline Send Email
Aug 9, 2009
3:52 am

Hello doctormuniz,   First of all you check that the data is not properly tuned (i mean data prepartion must be done properly ) It is clearly showing , if...
venugopal sharma
svsharma_pro
Offline Send Email
Aug 9, 2009
9:24 am

Your data seems to be following either exponential or lognormal distribution. For this kind of distributions, the best measures of central tendency and...
Madan Kundu
madan4331
Offline Send Email
Aug 9, 2009
12:34 pm

Dear […] Thanks for raising problems which many of us are facing. I want to put some general observation (which may be indirectly associated with your...
Nand Kishore Singh
nk_singh1
Offline Send Email
Aug 9, 2009
1:23 pm

Thanks to Madan, Nand and svsharma for your informative replies . Here are my comments. svsharma: This data that i'm using are from real financial transactions...
bigitop
Offline Send Email
Aug 10, 2009
2:44 am

Dear […] 1.       In your case no distribution fits (as table shows) and is worst situation. Even in better situation where many distributions may fit...
Nand Kishore Singh
nk_singh1
Offline Send Email
Aug 10, 2009
5:23 am

Hi Nand, Group, please take a look at my comments below... Nand Kishore Singh <nk_singh1@...> wrote: 1. In your case no distribution fits (as table...
bigitop
Offline Send Email
Aug 12, 2009
6:17 am

Dear Friend Thanks for you well organized reply. I will read your detail reply. In reference of ratio vs interval scale data, it is very strange that we are...
Nand Kishore Singh
nk_singh1
Offline Send Email
Aug 12, 2009
7:49 am

I understand where the confusion is. Let me explain. In financial terms "profit" refers to the difference between revenue and cost. So if cost exceeds revenue...
bigitop
Offline Send Email
Aug 12, 2009
5:43 pm

Ok I think I didnt explain it very well in my last post about profit... consider this... Cost = $2000 Revenue = $1500 Arbitrary zero point for profit = 0 =...
bigitop
Offline Send Email
Aug 12, 2009
8:15 pm

It is clear now. Thanks! Nand Kishore ... From: bigitop <doctormuniz@...> Subject: [Statisticians_group] Re: help to determine the best statistical...
Nand Kishore Singh
nk_singh1
Offline Send Email
Aug 13, 2009
1:14 am
Advanced

Copyright 2009 Yahoo! Inc. All rights reserved.
Privacy Policy - Terms of Service - Guidelines - Help